Simulation of a morphological image processor using VHDL - Part I: Mathematical Components by Chen, Wei-chun
Rochester Institute of Technology
RIT Scholar Works
Theses Thesis/Dissertation Collections
2-1-1993
Simulation of a morphological image processor
using VHDL - Part I: Mathematical Components
Wei-chun Chen
Follow this and additional works at: http://scholarworks.rit.edu/theses
This Thesis is brought to you for free and open access by the Thesis/Dissertation Collections at RIT Scholar Works. It has been accepted for inclusion
in Theses by an authorized administrator of RIT Scholar Works. For more information, please contact ritscholarworks@rit.edu.
Recommended Citation
Chen, Wei-chun, "Simulation of a morphological image processor using VHDL - Part I: Mathematical Components" (1993). Thesis.
Rochester Institute of Technology. Accessed from
Simulation of A Morphological Image Processor
Using VHDL
Part I: Mathematical Components
Wei-chun Chen
A Thesis Submitted in Partial Fullfilment Of the Requirements for the degree of
Master of Science
in
Computer Engineering
Approved by: Professor
George A. Brown (Thesis Advisor)
Professor
Tony H. Chang, Ph. D.
Professor
Roy S. Czernikowski, Ph. D.
Department of Computer Engineering
College of Engineering
Rochester Institute of Technology
Rochester, New York
February 1993
Title of thesis:
Simulation of A Morphological Image Processor Using VHDL
Part I: Mathematical Components
hereby grant permission to the Wallace Memorial Library and Computer Engineering
Department of RIT to reproduce my thesis in whole or in part. Any reproduction will not
be for commercial use or profit.
Date: March 1, 1993
Acknowledgments
We wish to express our grateful appreciation to the many people who helped make this thesis
a reality. We are particularly indebted to Professor George A. Brown for his invaluable
advice and encouragement throughtout this entire project. Special thanks are due to Jens
Rodenberg who provided enormous information about the MIP system design. We are also
thankful to Professor Roy S. Czernikowski and Tony H. Chang who painstakingly read the
entire manuscript and made valuable suggestions. Our sincere appreciation is extended to
Jeff Hanzlik, Chris Insalaco, Shishir Ghate, and Larry Robin for their previous effort on the
MIP project.
Abstract
Very high speed integrated circuit Hardware Description Language (VHDL) is utilized in
this project to model a Morphological Image Processor (MIP) Array. Both behavioral and
structural models have been established at the system level, and the simulation results from
both models are consistent with each other. The successful implementation of the models
accomplishes our original goal to document the MIP with VHDL. It is observed from the
project that VHDL is a powerful language. It is flexible since it can be used to model any
level of a system independent of the technology.
Glossary
ALU Arithmetic/Logic Unit
ASCII American Standard Code for Information Interchange
BLM Behavioral Language Model
EISA Extended Industrial Standard Architecture
FIFO First-In-First-Out
FP Function Processing
FPGA Field Programmable Gate Array
FSP Function Set Processing
IMG IMaGe file format used for frame grabber
MAP Morphological Array Processor
MIP Morphological Image Processor
MU Morphological Unit
RTL Register Transfer Level
SP Set Processing
TIFF Tag-based Image File Format for storing and interchanging raster images.
VHDL VHSIC Hardware Description Language
VHSIC Very High Speed Integrated Circuit
VLSI Very Large Scale Integrated circuit
Contents
1 Introduction i
2 Morphology Theory 3
2.1 Digital Images 3
2.2 Basic Morphological Operations 4
2.3 Extended Morphological Operations 6
2.4 Implementation g
2.5 Examples g
2.5.1 Example 1: g
2.5.2 Example 2: 9
3 VHDL 12
3.1 Overview of Top-down Design 12
3.2 Entity and Architecture 14
3.3 Signal vs. Variable 15
4 System Description 18
4.1 Data Path ig
4.2 Arithmetic Functional Blocks 20
4.2.1 ALUs 20
4.2.2 MU and Volume Adder 22
4.3 Control and Status Registers 23
4.3.1 Volume Adder Registers 23
4.3.2 ALUs 23
4.3.3 Master Controller 24
4.3.4 Bus Interface 25
4.4 EISA Bus Interface 27
4.5 Operating Procedures 29
4.5.1 Stage A: Memory / Mask Write 32
4.5.2 Stage B: ALUs and MU setup 34
4.5.3 Stage C: MIP Process 34
4.5.4 Stage D: Memory/Register Read 35
5 Architecture Partition and Modeling 36
5.1 Architecture Partition 36
5.2 Behavioral Model of the MIP 39
5.2.1 SD Bus Emulation 41
5.2.2 Bus to Integer Conversion 42
5.2.3 8/16 Bit Data Transfer 44
5.2.4 LA Latching 44
5.2.5 I/O Transfer 44
5.2.6 Mathematical Operations 48
5.3 MIP Structural Model 52
5.3.1 Timing of the MIP 54
5.3.2 Overview of the MIP 59
5.4 Simulation 60
6 Mathematic Units 68
6.1 ALU1 68
6.1.1 Ports 68
6.1.2 Processes 69
6.2 MAP 74
6.2.1 Ports 77
6.2.2 Processes 77
6.3 ALU2 81
6.3.1 Ports 82
6.3.2 Processes 82
6.4 Volume Adder 86
m
6.4.1 Ports gg
6.4.2 Processes gg
7 Memory Units 90
7.1 Memory 9q
7.1.1 Ports 90
7.1.2 Process 90
7.2 Buffer 91
7.2.1 Ports 91
7.2.2 Process 92
7.2.3 Further Implementation 92
8 Conclusion 94
A Bus Interface 97
A.l Input and Output Signals 97
A.2 VHDL Model of the Bus Interface 99
B Controller 106
B.l Master Controller 106
B.l.l Inputs and Outputs 106
B.l.2 VLSI Version vs. FPGA Version 109
B.1.3 VHDL Model of the Master Controller 109
B.2 Memory Controller 124
B.2.1 Inputs and Outputs 124
B.2.2 VHDL Model of the Memory Controller 124
C Utilities 128
C.l XEROX 7650 Scanner 128
C.2 TIFF to IMG 128
C.3 IMG to ASCII / ASCII to IMG and Display an IMG Image 129
C.4 ASCII to PS 129
C.5 Display a PostScript Image on PC 129
C.6 Connect PC with Apollo Workstations - DPCI 129
IV
C.7 Print out a PostScript on LaserJet on Apollo 129
List of Figures
3.1 Top-down Design Process 13
4.1 MIP Data Path 19
4.2 Signal Timing of Reset Bus Cycle 29
4.3 Signal Timing of Register R/W Bus Cycle 30
4.4 Signal Timing of Memory R/W Bus Cycle 31
4.5 MIP Process flow 33
5.1 MIP Hierarchy 37
5.2 MIP Behavioral Model with PC BUS 40
5.3 MIP Structural Model with PC BUS 53
5.4 MIP Timing Chart 55
5.5 MIP Timing Chart (continued) 56
5.6 A Partial 32 X 32 image in ASCII format 60
5.7 Resultant Image, top-left :original image, top-right: first erosion, bottom
right: second erosion, bottom left: third erosion 61
5.8 Original 8-bit grey scale image 63
5.9 Resultant image of a complete 512 x 512 image after erosion 64
5.10 Resultant image of two 512 x 256 images after erosion 65
6.1 The Blanking Sequence of the MAP 76
A.l Schematic of BUS Interface 100
B.l Schematic ofMaster Controller in the VLSI version 110
B.2 Schematic of Master Controller in the FPGA version Ill
B.3 Stages of Master Controller 113
VI
B.4 Schematic ofMemory Controller 125
vn
List of Tables
2.1 Threshold Representation of a Discrete, Quantized Signal 8
2.2 FSP Dilation, Erosion, Opening, and Closing 9
2.3 /() and g(v) 9
2.4 illustration of a grey scale dilation 10
2.5 illustration of a grey scale erosion 11
4.1 ALUs' Operations 22
4.2 The Map of Registers in the MIP 23
4.3 Volume Adder Registers 24
4.4 ALU Registers 24
4.5 Start Register 24
4.6 Control and Status Registers 25
4.7 Memory Select 26
4.8 Memory to Local Bus Connection 26
4.9 On-board Memory Segments 26
4.10 PC Address Control 27
4.11 System Commands 32
5.1 MIP Entity 39
5.2 The Window Array 62
7.1 Tri-state Signal Resolving Table 92
A.l I/O Address 98
vm
Chapter 1
Introduction
The aim of this project was to model a Morphological Image Processor (MIP) with VHDL,
a hardware description language. A MIP can be used to analyze an image based on a
predefined geometric shape by applying set operations on the image. The total project was
accomplished by the joint efforts of Wei-chun Chen and Hao Chen due to its complexity
and size. Two separate thesis topics entitled Mathematical Components and Control
Mechanism have resulted from the project to create a VHDL model of the Morphological
Image Processor.
In order to understand the operation of the MIP, the study of the MIP theory is first
presented in Chapter 2 (by Wei-chun Chen), in which digital images are defined, and mor
phological operations are explained. Chapter 3 (by Hao Chen) describes the important
concepts in VHDL, and the role of VHDL in the top-down design. In Chapter 4 (by Wei-
chun and Hao), the MIP system is described in detail. The data path between different
buses is explained, the functionality of the arithmetic blocks is discussed, and the con
trol/status registers are presented. In addition, the interface between the MIP and the host
computer is illustrated, and the procedures for operating the MIP are given. These chapters
are shared by Wei-chun Chen and Hao Chen for readers to better understand the entire
project. The aim of this project is to model a Morphological Image Processor (MIP) with
VHDL, a hardware description language. A MIP can be used to analyze an image based
on a predefined geometric shape by applying set operations on the image.
The modeling of the MIP in VHDL is discussed from Chapter 5 to Chapter 9. Based on
the system description in Chapter 4, the MIP is partitioned in Chapter 5 (by Wei-chun and
Hao) into four different functional blocks: I/O Unit, Control Units, Arithmetic Units, and
Memory Units. Each functional block consists of one ormore physical blocks. Each physical
block has its corresponding VHDL model. The behavioral and structural models of the MIP
are discussed in Chapter 5 as well. Chapter 6 visits the Arithmetic Units: Arithmetic/Logic
Units (ALU1 and ALU2), Morphology Unit(MU), and Volume Adder. The MU is further
decomposed into First-In-First-Out (FIFO) and Morphological Array Processor (MAP). At
last, the Memory and the Buffer in Memory Units are discussed in Chapter 7.
Appendix A and appendix B are written by Hao Chen to describe the control mechanism
of the MIP. In appendix A, the Bus Interface in I/O unit is described. Appendix B deals with
the ControDer Units, which consist of the Master Controller and the Memory Controller.
The original architecture and the FPGA version of the MIP were designed by Jens
Rodenberg and Jeff Hanzlik. The first VLSI version of the MAP was designed by Larry
Rubin. The VLSI version of the ALU1, the ALU2, the Volume Adder, and the revised
VLSI version of the MAP were designed by Shishir Ghate. The VLSI version of the Master
Controller and Memory Controller were designed by Chris Insalaco. All of the BLM models
for the components in the MIP were written by Jeff Hanzlik. The VHDL models of the units
are based on the BLM models, as well as the schematics from various versions mentioned
above.
Chapter 2
Morphology Theory
According to the Webster Electronic Dictionary, the word morphology refers to the study
of form and structure. In image processing, morphology was first used by G. Matheron as a
methodology which analyzes an image based on a predefined geometric shape by applying
set operations on the image. The image operations of mathematical morphology, known as
morphological filters, are more suitable for shape analysis than linear filters [3].
The morphology theory can be applied on either binary or grey-scale images. The
definitions of an image and morphology operations will be presented in Section 2.1. The
hardware implementation of the operations will be explained in Section 2.4.
2.1 Digital Images
A digital image is normally created by the process of sampling a continuous image. The
image can be represented as a function whose domain is a subset of a discrete space and
whose range is a subset of integers. An element in the domain represents the coordinate of
a pixel while an element in the range is the signal strength of a pixel. An image consisting
of monochromatic pixels is referred as a grey scale image.
A grey scale image can be converted to a binary image by applying a thresholding
processing. The thresholding process defines the corresponding pixel value as
"1"
when the
grey-scale pixel value is larger or equal to the threshold value; otherwise, the pixel value
is "0". The binary image can be represented by a set of coordinates for those pixels with
value "1".
In our discussion, the discrete space is limited to two dimensions for convenience.
A binary image is defined as a set, X:
X {x | x is the coordinate of the image, a; G Z2}. (2.1)
Z is the set of integers. Z2 is a two dimensional discrete space.
A grey-scale image is defined as a function, /:
/ : E - F, {
1
f(x) = {y\yeF} if x e E
oo otherwise
(2.2)
72F is the range of the function and E is the domain of the function. F C Z, E C Z
The thresholding process is accomplished by obtaining thresholding sets for a grey-scale
image. A thresholding set is defined by
Ta(f) = {x | f(x) > a, a Z, x E}, -oo < a < oo. (2.3)
in which a is a threshold value. It is clear by comparing equations 2.1 and 2.3 that a
thresholding set is a binary image. For an 8-bit grey-scale image, there are 256 threshold
values from 0 to 255. After the thresholding process, the grey-scale image is decomposed
into 256 binary images.
The decomposed grey-scale image can be reconstructed by the operation:
f(x) = max{a : x e ?(/)}, Vs. (2.4)
2.2 Basic Morphological Operations
The morphological operations are based on a structuring element to analyze an input image.
Any small sized image with a simple shape can be used as a structuring element. There are
three categories of processing classified by the input images and the structuring elements.
We will define two basic operations in each category: dilation and erosion.
The first category is set processing (SP) with a binary input image and a structuring
element. Let Xb = {xb : x e X} denote the vector translate of X by 6. The dilation
of a binary image X by a binary structuring element B is defined as:
X B = (j (X + b) = {x + b | Vx G X A V6 G B} (2.5)
bB
Dilating an image by a structuring element B has the effect of "expanding" the image in a_
manner determined by B.
The erosion of a binary image X by a binary structuring element B is defined as:
X 6 B = p| (X - b) = {x | x G X A (B + x) C X}. (2.6)
Eroding an image by a structuring element 5 has the effect of "shrinking" the image in a
manner determined by B.
The second category is function set processing (FSP) with a grey-scale input image and
a binary structuring element. An important property of the FSP operations is that they
commute with thresholding. That is, let <f> denote a FSP operation and let $ denote its
respective SP operation. Then, we say that <f> commutes with thresholding iff
W)] =W)],vte?. (2.7)
The dilation of a grey-scale image / by a binary structuring element B is defined as:
(/ B){x) = max{/(z -y),x-yeE}. (2.8)
y&B
The erosion of a grey-scale image / by a binary structuring element B is defined as:
(/ G B){x) = min{/(x + y), x + y G E}. (2.9)
y&B
As seen from equation 2.7, the FSP operation is equivalent to 256 SP operations for a 8-bit
grey-scale image. Therefore, the FSP operation has received more attention in mathematical
morphology research.
The third category is function processing (FP) with a grey-scale input image and a
grey-scale structuring element. Let g be a function whose domain is I C
Z2
and whose
range is J C Z. The dilation of a grey-scale image / by a grey-scale structuring element g
is defined as:
(/ g)(x) = max{/(2/) + g(x-y),xE, V(x - y) G /}. (2.10)
The erosion of a grey-scale image / by a grey-scale structuring element g is defined as:
(/ 6 g)(x) = min{/(y) - g(y - x),x e E,V(y - x) G /}. (2.11)
It should be realized that the FP operations do not commute with SP operations.
2.3 Extended Morphological Operations
The utilization of erosion and dilation can be extended to opening and closing. Any set of
the erosion and the dilation operations defined above can be used in an opening or a closing
operation.
The opening operation can be defined as:
(XoB) = (XQB)B. (2.12)
The closing operation can be defined as:
(XB) = (XQB)eB. (2.13)
2.4 Implementation
Our goal is to design and simulate a grey-scale morphological processing system. In the
system, images and structuring elements will be grey-scale images. A structuring element
is referred as a mask in future discussions. In equation 2.2 we have shown that the number
oo is used to represent an undefined pixel value; therefore, a 9-bit 2's complement code
is used instead of a 8-bit unsigned binary code. The number 256 is used to represent the
-oo value. Even though the original image from a sampling does not have negative valued
pixels, the negative value for a pixel can occur during computation. The maximum image
frame size is fixed to 512 by 512 (or 1024 by 1024) pixels for our system while the mask
frame size is 7 by 7 pixels.
The essential operations in the MIP system are erosion and dilation since the opening
or closing operations can be created using erosion and dilation. We will first explain the
implementation of FP operations, then apply the result to FSP operations.
The definitions of FP operations have been shown in equation 2.10 and 2.11. The com
puting procedures for dilation as shown by equation 2.10 can be described by the following
pseudo code:
for (output_col_index=0; output_col_index<512; output_col_index++)
for (output_row_index=0; output_row_index<512; output_row_index++)
{
for (mask_col_index=0; mask_col_index<7; mask_col_index++)
for (mask_row_index=0; mask_row_index<7; mask_row_index++)
{
input_row_ index := output_row_index - mask_row_index + (mask_size-l)/2;
input_col_index := output_col_index - mask_col_index + (mask_size-l)/2;
OUTPUT. IAMGE (output_col_index,output_row_ index) :=
max( map_add(INPUT_IAMGE(input_col_index, input_row_index) ,
MASK(mask_col_index, mask_row_index )));
>
}
The procedure can be visualized with the following steps:
1. rotate the mask by 180 degrees;
2. align the target pixel (starting from column 0, row 0 of the image) with the center of
the mask;
3. add the corresponding pixels between the mask and the part of the image overlapped
with the mask;
4. choose the maximum value of the summations as the value of the target pixel;
5. slide the mask to the next target pixel and repeat the previous steps until the whole
image is processed.
By comparing equation 2.11 with 2.10, the erosion operation can be computed by
negating the mask pixel value but not rotating mask. However, the target value is now the
minimum value of the summations.
The negating and rotating mask procedures for erosion and dilation can be easily accom
plished by a microcomputer such as an IBM PC without enlarging the hardware. There-
fore, the system has been partitioned to compute the summations and search the maxi
mum/minimum by hardware, and to rotate and negate the mask by software.
The reader should be aware that if a -oo is used in a mask, the negated value is +00
which is not defined in our 9 bit 2's complement coded system. This technical problem can
be solved by using the maximum value 255 instead of +00. The trade off is that the actual
value of 255 is indistinguishable from +00.
The FSP operation can be applied on this system by using a proper mask. In the mask,
0 is used for the image and 00 for the background.
2.5 Examples
The following two examples are modified based on the works of Haralick [2] and Morgos [4].
2.5.1 Example 1:
We will illustrate through an example in table 2.1 the procedure of thresholding a 1-D grey
scale image as well as the result of FSP operations on the image. The first row shows the
coordinate of the pixel, while the second row shows the value of the pixel. The resultant
thresholding sets are shown from row 3 to row 6 in which
''
shows that the corresponding
coordinate is an element in the set. The equivalent representation of the thresholding sets is
the binary images which are shown from row 7 to row 10. The original grey-scale function
can be reconstructed by restoring the maximum value showing on the thresholding sets to
the correspondent coordinate.
1 X 0 1 2 3 4 5 6 7 8 9 10
2 /(*) 1 1 2 1 3 0 0 1 0 2 3
3 Uf)
4 T2U)
5 Tx{f)
6 Uf)
7 h(x) 0 0 0 0 1 0 0 0 0 0
8 /2(z) 0 0 1 0 1 0 0 0 0 1
9 /i(*) 1 1 1 1 1 0 0 1 0 1
10 Mx) 1 1 1 1 1 1 1 1 1 1
Table 2.1: Threshold Representation of a Discrete, Quantized Signal
The f(x) is processed by a structuring set B = {-1,0, 1} for dilation, erosion, opening,
and closing as shown in table 2.2.
Each result of table 2.2 can be used to threshold and generate another resultant binary
image sets. As the equation 2.7 shown, these resultant sets should be identical to the
resultant sets using the row seven to ten of table 2.1 as input and applied the set B for the
same FSP operations.
X -1 0 1 2 3 4 5 6 7 8 9 10 11
fix) oo 1 1 2 1 3 0 0 1 0 2 3 00
f(x)@B 1
f(x) 6 B oo
f(x) o B oo
f(x) . B oo
1 2 2 3 3 3 112 3
-oo 1 1 1 0 0 0 0 0 0
1 11110 0 0 0 0
1 12 2 3 11112
3 3
00 oo
0 oo
3 oo
Table 2.2: FSP Dilation, Erosion, Opening, and Closing
2.5.2 Example 2:
The f(u) and g{v) in table 2.3 are the coordinate and pixel value of two grey-scale images.
The table 2.4 and 2.5 are the computing results according to the equation 2.10 and 2.11.
u= 15 16 17 18 19 v= 0 l 2
/()= 4 7 -5 6 8 g(v)= l 17 -3
Table 2.3: f(u) and g(v)
x = 15 x = 16
x - y 0 1 2 x - y 0 1 2
y 15 14 13 y 16 15 14
f(y) 4 00 00 f(y) 7 4 oo
g(x - y) 1 17 -3 g(x-y) 1 17 -3
f + g 5 oo oo f + g 8 21 00
(/ff)(15) = 5 (/<?)(16) = 21
x = 17 x = 18
z - 2/ 0 1 2 a; -y 0 1 2
y 17 16 15 y 18 17 16
f(y) -5 7 4 m 6 -5 7
g(x - y) 1 17 -3 g(x-y) 1 17 -3
f + g -4 24 1 f + g 7 12 4
(/flO(17) = 2^ (/</)(18) = 12
x = 19
z - y 0 1 2
y 19 18 17
f(y) 8 6 -5
g{x - y) 1 17 -3
f + g 9 23 -8
(/S)(19) = 23
a; 15 16 17 18 19
(/s)W 5 21 24 12 23
Table 2.4: illustration of a grey scale dilation
10
x = 15 x = 16
y-x 0 1 2 y-x 0 1 2
y 15 16 17 y 16 17 18
f(y) 4 7 -5 f(y) 7 -5 6
g(y-x) 1 17 -3 g(y- x) 1 17 -3
f + g 3 -10 -2 f + g 6 -22 9
(/9<7)(15)=-10 (feg (16) = -22
x = 17 x = 18
y-x 0 1 2 y-x 0 1 2
y 17 18 19 y 18 19 20
f(y) -5 6 8 f(y) 6 8 00
g(y-x) 1 17 -3 g(y-x) 1 17 -3
f + g -6 -11 11 f + g 5 -9 oo
(fog) (17) = -11 (/e<?)(i8) = oo
x = 19
y-x 0 1 2
y 19 20 21
f(y) 8 oo 00
g(y-x) 1 17 -3
f + g 7 00 oo
(feg)(i9) = -oo
X 15 16 17 18 19
(feg)(x) -10 -22 -11 oo 00
Table 2.5: illustration of a grey scale erosion
11
Chapter 3
VHDL
VHDL is an acronym for VHSIC Hardware Description Language. It is an industry standard
language used to describe a digital system from abstract level to concrete level. One of the
important features of the language is that it has constructs that enables a designer to
express the concurrent or sequential behavior of a digital system with or without timing.
The advantage of using VHDL is clearly illustrated in top-down design methodology. We
will discuss the top-down design and the important role of VHDL in the design before
moving on to the topics in VHDL.
3.1 Overview of Top-down Design
In design methodologies, top-down design is very significant since it reduces the design
cycle, increases the design flexibility, and improves the design quality and productivity.
The top-down design methodology is a process that begins with establishing an archi
tecture and defining a behavioral logic description using a high-level hardware description
such as VHDL. This could be at a system or block diagram level, or register transfer level
(RTL), but it begins at some point above gate level. [5] Figure 3.1 shows the top-down
design process.
The behavioral model is evaluated against its architecture specifications using realistic
test inputs via simulation to determine the correctness. Once the model is verified, the
designer can further decompose the model into sub-models as desired, and evaluate the
sub-models accordingly. Simulating at successive levels of the design process and making
corrections minimizes or avoids flaws appearing at the most costly stage: when the design is
12
Figure 3.1: Top-down Design Process
Manufacturing
13
complete. In addition, simulation using VHDL allows changes to be made relatively easier
and in much less time. Traditional gate-level simulation discourages trying alternative
approaches since schematic capture, logic simulation, and timing analysis require a great
deal of time and computing resources.
The next stage in the process is to select a technology for the design. This is another
advantage of VHDL since it offers verifications independent of the technology. After the
technology is selected, the VHDL models can be synthesized. Logic synthesis provides a
link between VHDL and a netlist. This feature is extremely important for a large system
design since it would otherwise be very difficult to keep track of a full gate-level description
using schematic capture. The gate-level representation obtained after logic synthesis is then
simulated and evaluated against the simulation results from the VHDL model. Next, this
representation is verified through a combination of functional verification, timing analysis,
and fault simulation. The circuit can be further optimized after functional verification or
timing analysis to improve the performance of the design. Layout is then produced, and
more detailed information is provided for additional timing adjustment and verifications.
This process is iterated until the design is ready for manufacture.
After illustrating the importance of VHDL in top-down design, we will discuss some
fundamental concepts in VHDL in the following sections.
3.2 Entity and Architecture
An entity is an abstraction of the actual hardware device. The ENTITY declaration speci
fies the name of the entity being modeled and the external interfaces of the modeled device.
An entity can include other entities, and it can also be included in another entity. Therefore,
VHDL supports both top-down and bottom-up design methodologies. The internal details
of an entity are described by an architecture body. Any architecture is associated with only
one entity, but a single entity can have multiple architectures.
In general, the modeling style of a model can be:
structural;
behavioral;
mixed.
14
In structural modeling, an entity is modeled as a set of components connected by signals.
In the behavioral modeling, however, the entity is modeled by statements describing the
functionality of the device. When the description of an entity contains both structural and
behavioral model elements, it is called mixed level modeling.
It should be mentioned that another popular set of modeling styles includes data flow
modeling in addition to the ones described above. The definition of behavioral modeling
in this case, however, deviates from the definition we gave previously. Here, the behavioral
modeling specifies the behavior of an entity as a set of statements that are executed sequen
tially, and the data flow modeling specifies the functionality of the entity by using concurrent
signal assignment statements. ( [6], p. 16) These definitions relate the modeling with coding
style, i.e, whether a functionality is expressed sequentially or concurrently. We will think
both of these as behavioral modeling since they both describe the FUNCTIONALITY of
an entity.
Our modeling philosophy is that behavioral modeling and structural modeling can be
coexist at each level through the hierarchy of the system except the very bottom level ele
ments, which can only be a pure behavioral model component. The existence of a behavioral
model provides for rapid functional simulation, while the existence of a structural model
allows for final architectural considerations. We will elaborate these concepts in Chapter 5
through Chapter 9, in which the implementation of VHDL for MIP is discussed.
We will discuss in next section the definitions of signal and variable, and other concepts
related with them.
3.3 Signal vs. Variable
A signal is an object that has a past history of values, a current value, and a set of future
values. A variable, on the other hand, is an object which holds a single value of a given
type.( [6], p.28) Signal objects can be regarded as end points of wires in a circuit. The
information a signal object carries is a two-dimensional waveform: The change of its digital
states vs. the change of the time. In order to discuss signal and variable objects in greater
detail, we need to introduce some related concepts.
Event: a term to indicate the change of a signal's value at a specified simulation time.
An event for a signal occurs at the simulation time if the value of the signal changes.
15
Otherwise, an event does not occur at the simulation time.
Process: the basic unit of execution. The unit contains sequential statements that describe
the functionality of a portion of an entity. A process statement itself is a concurrent
statement. More than one process can be used within an architecture body to capture
the behavior of interacting processes.
Sensitive List: a set of signals to which the process is sensitive. Any time an event occurs
on any signal in the sensitive list, the statements in the process will be executed
sequentially. The process suspends after execution of last sequential statement in the
process and waits for another event on any signal in the sensitive list to occur.
Delta Delay: a representation of an infinitely small delay. This small delay corresponds
to a zero time delay of a device and hence does not correspond to any real simulation
time. Each unit of simulation time can be considered being composed of an infinite
number of delta delays. The purpose of delta delay is to provide a mechanism for
ordering events on signals that occur at the same simulation time. Therefore, an
event on a signal always occurs at a real simulation time plus an integral number of
delta delays.
Inertial Delay: the amount of delay time for a stable signal to propagate from input to
output of a concurrent element, or statement. If the input signal is not stable during
the specified inertial delay time, no event for the signal will be scheduled. Inertial
delay is often used to filter out unwanted spikes and transients on signals. Since the
inertial delay is mostly common in digital circuits, it is the default delay model.
Transport Delay: the amount of delay time for a signal to propagate from input to output
of a concurrent element, or statement. It models pure propagation delay. Transport
delay model is especially useful for modeling wire delays. Any input pulse, no matter
how small its width, will be propagated to output after the specified delay time.
The concepts explained above are important to understand the differences between a
signal and a variable. First of all, a variable is different from a signal in terms of the
value assignment. A variable is always assigned a value immediately upon evaluations, but
a signal is assigned a value after the specified delay or a delta delay. Secondly, a signal
has value and time information, but a variable has value only. Thirdly, processes within
16
an architecture body communicate with each other using signals that are visible to all the
processes. However, variables can not be used to pass information between processes since
their scope is limited to within a process.
When a signal is assigned a value inside of a process, it uses a sequential signal as
signment statement; when a signal is used outside of a process, it uses a concurrent signal
assignment statement. A concurrent signal assignment statement is event-driven so it is
executed whenever there is an event on a signal that appears in its defining expression. A
sequential signal assignment is not event-driven and is executed in the order determined by
the sequential list of statements in a process.
Although there are many other important concepts in VHDL, we have only chosen
the very small number of concepts to be discussed in previous sections since we feel that
understanding of these concepts is vital to our system modeling and simulation. In next
chapter, we will describe the functionality of the Morphological Image Processor.
17
Chapter 4
System Description
4.1 Data Path
In the previous chapter, the important concepts of VHDL have been introduced and dis
cussed. This chapter will describe the functionality of the MIP. As an image processing
sub-system, the MIP is able to:
receive commands from the PC;
receive and store an image from the PC;
process an image;
allow the PC to retrieve a processed image.
The transfer of data and commands is accomplished between the Extended Industrial
Standard Architecture (EISA) bus on the PC and local buses on the MIP, while the image
processing is done in the ALU1, the MU, and the ALU2. The detailed information is shown
in figure 4.1. The dark colored paths in the figure are the local buses of the system. The
paths with light color are the connective wires. The data transactions and image processing
can be described in four stages:
Load Data: image data are written from the PC to the memory bank through the EISA
bus and the I/O bus. Mask data are written from the PC to mask registers in MU
through EISA bus.
18
Figure 4.1 :M I P DATA PATH
Figure 4.1: Any image of appropriate size can be written
to (or read from) the memory bank from (or to) the PC
memory. The morphological mask values are written
directly from the PC to registers in the MU. The image
can be processed by ALU1, MU, and ALU2. The
resultant image from ALU2 is stored back to the memory
bank. The volume adder sums each pixel value in a
frame and send the result to the PC.
19
Process Image: during the MIP operation, ALU1 uses an image from the memory bank
through XI bus as one operand and an image through X2 bus as the other operand.
The output of the ALU1 is entered into the MU, which then processes the image and
sends out the resultant image as well as the original image to ALU2. ALU2 selects
two of the three input images as its operands. Two inputs are from the MU, and the
third input is from the X2 bus. For each processing cycle, the image from the X2 bus
can only be exclusively used by either ALU1 or ALU2.
Store Output: after the MIP operation, the processed image is stored back to the memory
bank through Y bus. The volume of the image is stored in the Volume Adder registers
which are read directly by the PC.
Read Out: the image data in the memory bank can be retrieved by the PC through the
XI, I/O, and the EISA buses.
The blocks shown in figure 4.1 are the functional data blocks for the MIP. The control
mechanism is not included in the figure in order to concentrate on the data-flow portion of
the MIP.
4.2 Arithmetic Functional Blocks
The arithmetic functions of the MIP are accomplished by two ALUs, the MU, and the
Volume Adder. The functionality of each component is explained in the following sections.
4.2.1 ALUs
There are two ALUs. ALU1 is the pre-processor for the MU, while ALU2 is the post
processor for the MU. As the name suggests, the ALUs perform arithmetic and logic
operations. ALU1 uses data from the XI and X2 buses as its operands, while ALU2 uses
data from two of its three input ports as its operands. These input ports are: the original
and processed images from the MU, and the image from any of the four memories through
the X2 bus. We will refer to the two active input ports in both ALU1 and ALU2 as A and
B in future discussions.
The operation of an ALU depends on the 4-bit binary code shown in table 4.1. The
operations for the ALU1 and the ALU2 are very similar. The only difference is that the 4x
20
and 5x hex coded operations are able to generate the maximum/minimum value of a image
on ALU2 but not on ALU1. Each operation is described below:
Ox: compares the corresponding pixels of images A and B, and outputs an image composed
of the pixels with the minimum value of the comparison.
lx: compares the corresponding pixels of images A and B, and outputs an image composed
of the pixels with the maximum value of the comparison.
2x and 6x: copies the input image from A as the output image.
3x and 7x: copies the input image from B as the output image.
4x: searches for the minimum value of a pixel in image A and stores the result in a register;
it also copies the input image from A as the output image.
5x: searches for the maximum value of a pixel in image A and stores the result in a register;
it also copies the input image from A as the output image.
8x: subtracts image B from image A. If the value of a pixel in B is oo, then the output
value of that pixel is oo.
9x: subtracts a constant from image A. If the value of the constant is oo, then the whole
output frame value is oo.
Ax and Ex: adds image A with image B.
Bx and Fx: adds a constant value to all pixels in image A.
Cx: subtracts image B from image A. If the value of a pixel in B is oo then the output
value of that pixel is the value in A.
Dx: subtracts a constant value from all pixels in image A. If the value of the constant is
-oo, then image A is used as the output.
21
ALU OPERATIONS
j Bit Op ALUs
Hex 7 6 5 4 ALU1 ALU2
Ox 0 0 0 0 min(A,B) YES YES
lx 0 0 0 1 max(A,B) YES YES
2x 0 0 1 0 copy(A) YES YES
3x 0 0 1 1 copy(B) YES YES
4x 0 1 0 0 min(A),copy(A) NO YES
| 5x 0 1 0 1 max(A),copy(A) NO YES
6x 0 1 1 0 copy(A) YES YES
7x 0 1 1 1 copy(B) YES YES
8x 1 0 0 0 A-B YES YES
9x 1 0 0 1 A-const YES YES
Ax 1 0 1 0 A+B YES YES
Bx 1 0 1 1 A+const YES YES
Cx 1 1 0 0 A-B (2) YES YES
| Dx 1 1 0 1 A-const (2) YES YES
Ex 1 1 1 0 A+B YES YES
Fx 1 1 1 1 A+const YES YES
Table 4.1: ALUs' Operations
4.2.2 MU and Volume Adder
The Morphology Unit performs erosion or dilation as defined by grey-scale morphological
operations. It adds the value on the mask with the corresponding pixel in sub-array of the
image defined by the target pixel. The minimum value for the target pixel is chosen from
the erosion operation or the maximum value is chosen from the dilation operation.
The Volume Adder takes the output from ALU2 and sums either the squared value or
the absolute value of each pixel to produces the volume of each image. The output is stored
in registers which can be accessed from the PC.
Although not included in figure 4.1, the control mechanism is important for the proper
operation of the MIP. The control and status registers are essential to accomplish this task.
The functionality of these registers are described in the next section.
22
4.3 Control and Status Registers
Table 4.2 shows the locations and addresses of the on board control or status registers. The
control and status registers will be described in terms of their functionalities. In each of
the tables used in this section, the ADDR column shows the hexadecimal register address
accessed through the PC bus. The BITS column indicates the corresponding bits for a
certain function. The R/W column shows that the register is either read-only by the PC or
write.only from the PC. If an address is shown as both readable and writable, the address
is actually shared by a read-only and a write_only register. The Content/Purpose column
describes the content of the read-only registers and the purpose for the write_only registers.
LOCATION REG No. ADDR LOCATION REG No. ADDR
Volume Adder 0 000300 ALU2 7 000307
1 000301 8 000308
2 000302 Master Controller 9 00030B
3 000303 10 00030C
4 000304 11 00030D
ALU1 5 000305 Bus Interface 12 00030E
6 000306 13 00030F
Table 4.2: The Map of Registers in the MIP
4.3.1 Volume Adder Registers
Bit 0 of the Reg_0 selects the original or squared output from ALU2 as the input of Vol
ume Adder as shown in table 4.3. The 34-bit volume is stored in read-only registers Reg_4
through Reg_0 in absolute binary format. Bit 7 of the Reg_4 indicates that a negative value
has been passed into the Volume Adder.
4.3.2 ALUs
Table 4.4 shows the control and status registers for the operations of the ALUs. Each ALU
requires 9 bits to store a constant in two's complement format. ALU1 uses Reg_5 and bit
0 of Reg_6 to store a constant while ALU2 uses Reg_7 and bit 0 of Reg_8. Bit 7 through
bit 4 of Reg_6 and Reg-8 are used to store op-codes for ALU1 and ALU2 respectively. Bit
23
REG ADDR BITS R/W Content/Purpose
0 000300 0 W 0 sum X input, 1 sum X Squared input
7-0 R Volume Adder result bits 7-0
1 000301 7-0 R Volume Adder result bits 15-8
2 000302 7-0 R Volume Adder result bits 23-16
3 000303 7-0 R Volume Adder result bits 31-24
4 000304 1,0 R Volume Adder result bits 33,32
7 R negative value flag
Table 4.3: Volume Adder Registers
3 and bit 2 of Reg_8 select the inputs of the A and B operands for ALU2. The minimum
or maximum value for operation 4x or 5x in table 4.1 can be read from Reg_7 and bit 0 of
Reg_8.
REG ADDR BITS R/W Content /Purpose
5 000305 7-0 W ALUl-const(7:0)
6 000306 0 W ALUl-const(8)
7-4 W ALU1 op-code
7 000307 7-0 w ALU2-const(7:0)
7-0 R ALU2-MAX/MIN(7:0)
8 000308 0 W ALU2-const(8)
7-4 W ALU2 op-code
3 W select B (0=MU's X-out, l=X2-Bus)
2 W select A (0=MU's Y-out, l=X2-Bus)
0 R ALU2-MAX/MIN(8)
Table 4.4: ALU Registers
4.3.3 Master Controller
Bit 0 of Reg_9 shown in table 4.5 is an active low signal to start the MIP processing.
REG ADDR BITS R/W Purpose
9 00030B 0 W 0> start MIP process
Table 4.5: Start Register
24
Table 4.6 shows the control and status register for the MIP. Bit 7 and bit 6 of Reg.10 are
used for pipelined operation. When bit 6 is active high, it indicates that the MIP processing
is running, and the system is safe to load the new mask or the new image. Bit 7 indicates
that the MIP processing is done and the system is safe to start the new processing. Bit 1
of Reg_10 is used to select the X2 bus connection to either ALU1 or ALU2. This selection
must match the ALU operations. Bit 0 is used to select either erosion (when it is 0) or
dilation (when it is 1).
REG ADDR BITS R/W Purpose
10 00030C 7 R OK to start next run
6 R OK to load next instruction/window
1 W bus mode (0=X2-Bus- ALU1, l=X2-Bus^ ALU2)
0 W MU's max (0=min, l=max)
Table 4.6: Control and Status Registers
Reg_ll in table 4.7 is used to configure the MIP bus connections in either local or the
PC mode. If bits 7 to 4 are all zeros, the MIP is in PC mode and one of the memories in the
memory bank should be chosen, using bits (3 to 0). This will connect one of the memories
with the PC through the XI bus for a memory read. Otherwise, the MIP is in local mode,
and all 8 bits are used to configure the connection between one of the four memories and
the local buses XI, X2, and Y for MIP processing.
4.3.4 Bus Interface
Reg_12 in Table 4.9 is used to configure the memory segments. Bit 5 of Reg_12 is a flag
for a mask load or a memory access. When the flag is 0, bits 4 and 3 of Reg_12 select
the memory controller which uses the address from the PC bus for either a memory read
or write. When the flag is 1, no memory controller is selected and each memory uses the
address generated by its own memory controller.
The PC address control is detailed in table 4.10. Bit 6 of Reg_13 enables write capability
to the on-board memories and mask registers when it is set to 1. Its default value is 0. Bits
4 through 0 of Reg_13 set up the source address for a memory write or the destination
25
r MEMORY SELECT: PC MODE
REG ADDR BITS R/W Purpose
11 00030D 7-4 W 0000 for PC model setup
3 W connect memory 3 to PC
2 W connect memory 2 to PC
1 W connect memory 1 to PC
0 W connect memory 0 to PC
MEMORY SELECT: LOCAL MODE
REG ADDR BITS R/W Purpose
11 00030D 7,6 W memory 3 to local bus connect (see Table 4.8)
5,4 W memory 2 to local bus connect (see Table 4.8 )
3,2 W memory 1 to local bus connect (see Table 4.8 )
1,0 W memory 0 to local bus connect (see Table 4.8 )
Note: refer to Table Y if bits 7:4 are all zeros
Table 4.7: Memory Select
BITS SELECT
2x/ + l 2x1
l_ 0 memory > Xl-bus
i 1 memory ? X2-bus
i 0 Y-bus ? memory
i 1 none connection
I is the memory No.
Table 4.8: Memory to Local Bus Connection
REG ADDR BITS R/W Purpose
12 00030E 5 W mask load select: 0? memory load, 1? mask load
4,3 W memory controller select = bit(4) x 2 + bit(3)
Table 4.9: On-board Memory Segments
26
address for a memory read. This address information will be compared with the addresses
on SA bus to determine whether the address on SA bus is valid.
REG ADDR BITS R/W Purpose
13 00030F 6 W write enable for on board memory and mask registers
0 ? write disable, Power On Default,
1 y write enable
4-2 W PC BASE ADDR:
(bit(4) x 4 + bit(3) x 2 + bit(2)) X 200000#
1,0 W PC OFFSET ADDR:
bit(l) x 100000# + bit(0) x 08000077
Table 4.10: PC Address Control
4.4 EISA Bus Interface
The MIP board communicates with a PC computer through the EISA bus. The EISA bus
includes a 16-bit data bus, a 24-bit memory address bus, and various control signals. The
operation of the MIP requires only a subset of the whole bus protocol. The required signals
are: RESET DRV, BCLK, BALE, SA(0 : 19), LA(U : 23), SD(0 : 15), MEMCSW,
MEMW, MEMR, IOW, and IOR. The following series of definitions gives detailed
functional descriptions of the signals mentioned above.
RESETDRV: RESET DRV (reset driver) is an output signal that is held active high
during system power-on sequences. It remains high until all levels have reached their
specified operating range; then it goes inactive low. In addition, the RESET DRV
line is brought active high if any power level falls outside its specified operating range
after a power on. This signal is called RESET on the MIP and is used to provide a
power-on reset to bring the MIP to a known state before its operation.
BCLK: The BCLK (bus clock) signal is an output signal providing an 8 MHz clock fre
quency for the MIP.
BALE: The BALE (bus address latch enable) is an output signal. This signal goes active
high prior to the address bus being valid and falls to inactive low after the address
bus is valid. It is used to latch the address information for the MIP.
27
SA(19 : 0): Address bits SA19 through SAO are output signals used to address the system-
bus attached memory and I/O. These signal fines are driven during system-bus cycles
for memory read, memory write, I/O read, and I/O write operations.
LA(23 : 17): Unlatched address bits LA2S through LA17 are output signals used to provide
memory address information about the present bus cycle. These address signals, unlike
SA19-SA0, are only valid for small portion of the addressing cycle. The information
provided by these address signals are latched on the falling edge of the BALE signal
by the MIP.
SD(15 : 0): Data bits S.D15 through SDO are bidirectional signals that support the trans
fer of data between the computer and the MIP.
MEMCS16: The MEMCS16 active low signal is used to indicate the 16-bit data transfer
on the present bus cycle. The signal is called MEMCSl6n on the MIP.
MEMW: The MEMW signal is an active low output signal used to write data from the
system bus into memory. The signal is called MEMWn on the MIP.
MEMR: The MEMR signal is an active low output signal used to request data from the
memory. The signal is called MEMRn on the MIP.
IOW: The IOW signal is an active low output signal. It indicates that the address bus
contains an I/O port address and the data bus contains data to be written into the
I/O register of the MIP. The signal is called IOWn on the MIP.
IOR: The IOR signal is an active low output signal. It indicates to the I/O port that the
bus cycle is an I/O port-read cycle and the address bus contains an I/O port address.
The I/O register on the MIP will respond by placing its data on the system data bus.
The signal is called IORn on the MIP.
The timing information for the above signals can be summarized by reset, I/O read, I/O
write, memory read, and memory write operations. Figure 4.4 shows the signal timing for
the reset bus cycle. The RESET pulse lasts for 1250 ns during which all the other signals
are disabled. 60 ns after RESET becomes inactive, BCLK starts to generate clock pulses.
The clock frequency is about 8 Mhz. Figure 4.3 shows the signal timing of the register
write/read bus cycles. The starting 0 ns point indicates the time at which the command
28
starts. The only difference between the read and write cycles is the timing on the data
bus. For a write operation, data is valid after IOW becomes active low and stays valid
till IOW becomes inactive. For a read operation, the data is required to be valid some
time before and after IOR becomes inactive. This is determined by the setup time and the
hold time. Figure 4.4 shows the signal timing of the memory write/read bus cycles. The
timing diagrams are based on simulation results from the PCBUS.BLM model written by
Jeff Hanzlik [8]. The above operations are controlled by the system through a series of I/O
commands. These commands are listed in table 4.11.
Figure 4.2 :
Signal Timing of Reset Bus Cycle
0
RESET ir
15>.51 1:111 M131 1 551 1671 1791 1 911
BCLK
/nunLJnunanL-1
BALE ^ -
!fe>
MEMRn i ' \
MEMWn '$
lORn _' >
lOWn t a
- - -
Figure 4.2:
In next section, we will discuss how to use these commands for an operation.
4.5 Operating Procedures
The system commands described in the previous section are used to control the MIP's oper
ation. It is noticed in the simulation of the MIP that the execution sequence of commands
29
Figure 4.3(a):
Signal Timing of Register Write Bus Cycle
120 240 360 480 600 720 S40
BCLK i
BALE
"
In ^
' - - . . . . . .
-
I A(??-17) >
-
-
SA(19:0)
y<
- -
< >--
mwn
\
- - - - - - - - -
- . / -
Figure 4.3(b):
Signal Timing of Register Read Bus Cycle
120 240 360 480 600 720 840
BCLK i
BALE
HUnunununUnunL.1
- -
- - - - - - - - -
>LA(23:17) ^^ - - - - - - - - - - -
SA(19:0)
-><
- -
- - - - - - - - -.
-
\
-
- - -
- - - -
/ -
30
Figure 4.4(a):
Signal Timing of Memory/ Mask Write Bus Cycle
0
BCLK t
BALE I
120
J"\
240
J\
360 48
J1
600
J"\
720
J~\
84(
J
-
LA(23:17) s/
SA(19:0) >f
r \
- - - -
- \ ) -
-
/
MEMWn \u 1
BCLK
BALE
LA(23:17)
SA(19:0)
SD(7:0)
MEMWn
Figure 4.4(b):
Signal Timing of Memory Read Bus Cycle
120 240 360 480 600 720 840
31
Commands Purpose
Reset causes RESET signal on the PC bus being active
IOR <address> reads 8-bit data from MIP's output register to
specified I/O port
IOW <address> writes 8-bit data from specified I/O port to
MIP's input register
PICLD <address> <image file> writes 9-bit data of a frame from the specified
image file to the memory
MASKLD <address> writes 9-bit data of a window to the mask.
PICRD <address> reads 9-bit data of a image from MIP to the system
Table 4.11: System Commands
is very important. The sequence is shown in figure 4.5 in the form of flow chart which
illustrates the MIP's operation. The grey blocks in the figure represent the stages that the
MIP's operation will step through. Each of the dotted blocks is an operating unit which
contains a sequence of necessary register and memory operations represented by solid black
rectangles. The following subsections are to discuss each of the stages in the MIP operation.
4.5.1 Stage A: Memory / Mask Write
The procedure starts by writing the image data and mask values into the MIP board. In
theory, the Memory write and Mask write can be executed in arbitrary order. In practice,
however, it is necessary to first configure the MIP to load the mask before starting the MIP
process due to the design constrains.
Memory write: writes an image frame to an on-board memory. Reg_15 and Reg_14 must
be set up before the operation.
IOW 00030F <data field> enables the memory and mask write, and sets the
PC memory source address (base and offset) for the image.
IOW 00030E <data field> configures memory/mask flag to memory load,
and selects the memory controller which controls the memory receiving the data.
PICLD <address> writes the image frame to the selected on-board memory
starting at <address+2>.
32
Figure 4.5:
MIP Operation Flow Chart
PC Address
Setup
Memory
Write
i i
<
i
PC Address
Setup
Mask Write
ALU1 Setup ALU2 Setup MAP Setup
Buses Configured
to Local Mode
A: Memory/ MaskWrite -
Write to on board
memory or mask
according to addresses
provided by PC bus.
B: ALUs and MAP Setup -
Setup constants and
operations in ALU1, MAP, and
ALU2.
C: MIP Process-
Configure memory to local
buses; start ALU 1, MAP,
and ALU2 operations.
MIP Process
PC Address
Setup
~
Buses Configured
to PC Mode
Register
Read
D: Memory/RegistersRead-
Read the on board memory or
registers'
values to PC memory
according to addresses
provided by PC bus.
Memory Read
33
Mask write: writes a mask to MU. Reg_15 and Reg_14 must be configured before the
operation.
IOW 00030F <data field> enables memory and mask write, and sets PC
memory source address (base and offset) for the mask.
IOW 00030E <data field> sets memory/mask flag to mask.
MASKLD writes mask to registers in MU.
4.5.2 Stage B: ALUs and MU setup
After loading the memory and the mask, the ALU operations, the MU operation, and the
local bus configurations can be set up in arbitrary order.
ALUl SETUP: sets up ALU1 constant and operation by the following commands:
IOW 000305 <data field>
IOW 000306 <data field>
ALU2 SETUP: sets up ALU2 constant and operation; connects the ALU2 A input to
either X2 bus or MU's Y output; connects the ALU2 B input to either X2 bus or
MU's X output. The commands are:
IOW 000307 <data field>
IOW 000308 <data field>
MU and X2 BUS SET UP: sets up the X2 bus connection to ALUl or ALU2 and the
maximum or minimum operation of MU by the command:
IOW 00030C <data field>
4.5.3 Stage C: MIP Process
At this stage, the local bus mode must be set up before starting the MIP process.
IOW 00030D <data field> configures the connections between memory and local
buses.
IOW 00030B starts the MIP operation.
34
SKIP: skips the number of pc bus clock cycles determined by the MIP board before
the resultant image can be fetched by the system.
4.5.4 Stage D: Memory/Register Read
After the MIP processing is done, the resultant image and the volume can be read back to
the PC in arbitrary order.
Memory Read: reads the image from MIP to PC.
IOW 30E <data field> configures memory/mask flag to memory load, and
selects the memory controller which controls the memory sending the data.
IOW 30D <data field> configures the connection between the on-board mem
ory and the PC memory.
PICLD reads the memory.
Register Read: reads registers 000300-304,000307,000308 and 00030C.
IOR <address>
35
Chapter 5
Architecture Partition and
Modeling
The data path of the MIP has been described in Chapter 4. In this chapter we will discuss
the architecture of the MIP as well as the behavioral and the structural models of the entire
MIP system.
5.1 Architecture Partition
Figure 5.1 shows the architectural hierarchy of the MIP. The hierarchy is based on the
functional blocks of the MIP. A functional block identified with a square indicates an actual
circuit component encapsulated by a VHDL model. A functional block identified with an el
lipse is a virtual component used for classification. We will briefly describe the functionality
of each block accordingly.
The Morphological Image Processor is a system model for the whole MIP. Two
independent models are designed to emulate the system for different usages: a stand-alone
behavioral model is used to emulate the system's behavior using minimum simulation re
sources; a structural model is constructed using lower level functional blocks carrying the
architecture information. The structural model consists of four functional blocks: I/O
Unit, Control Units, Arithmetic Units, and Memory Units.
The I/O Unit is the interface between the host computer and the MIP board. It accepts
commands from the host computer and distributes the commands to the other units.
The Control Units enclose both the Master Controller and the Memory Controller.
36
c
^ ZJ
<D JZ
*= o
3 1
CD CD
/ >> \
vf o i2\
~^\ E Eh
1_
O
E
CD
'E"
o
W
(A
<D
U
Vs / .c
o
1
CD
o
"*-""
1_
Q.
0)
U)
CD .
E
-ffi
"I ^
3
O
(0 ._ ,^ -*
E <
CD
Q.
<
CZ
o
^
1
(0 CD
o
Z>O)
o o 2E
^^ C/3
C/5
A \o
Q.
O
I "55 o>I
V E pi o
c
3
o
1
O
\i =/
'E*
o
2
Q_
CD V< 1 CMZ3
ZJ r
o
u. CD
M co < cd
o E <:
> "cc
o c
o
(0
o
"en
o
o
_tz
Q.
O
^ 5<
3
.C
o
CD
if *"**S
75 L.
>_ >, JD
3 o "5 'o i ^
+
O
\8/
"^
CO 0
X 9. 0
0
CD
SZ C| _ o 0
O
<
._ <D
S O
C/J =
c
S o
0 0 ~
oi' r "
CO '
X 1
| 1
lO O : 0 10)
^v J
1_
3
en
CQ t
o s
O)
LL vl O e| *V CO
x^1 ^ =5/ >^
Q_ C
37
The Master Controller is responsible for timing and blanking of the MAP's operation, while
a Memory Controller generates the correct address for the corresponding memory during
either read or write memory cycle.
The Arithmetic Units includes four functional blocks: ALUl, MU, ALU2, and Volume
Adder. These units perform the computing functions described in the previous chapter. The
MU consists of the MAP and the FIFO. FIFOs are used to provide the fine delay for the
input to the MAP. The MAP performs the morphological operations described in the
previous chapter.
The Memory Units include the Memory which is an on-board memory chip, and
Buffer which is a tri-state I/O buffer. The on-board memory is used to store the original
image or a processed image, while a buffer is used to control the signal flow.
In order to implement the functionality of the MIP in hardware, Jeff Hanzlik and Jens
Rodenberg designed the original architecture and implemented a prototype board using
Field Programmable Gate Arrays (FPGA). The FPGA is suitable for prototype circuit
design because of its low cost and fast turn out. After the architecture was verified by the
FPGA version, Larry Rubin, Chris Insalaco, and Shishir Ghate began a design to implement
the same architecture with fully customized VLSI devices to improve the circuit speed of
the process.
The differences between the FPGA version and the VLSI version are:
1. the memory controDer chip contains one memory controller in FPGA version, but two
in the VLSI version.
2. the MAP is composed of 26 chips in FPGA version, but 7 chips in the VLSI version.
3. ALUl and ALU2 are identical devices in the VLSI version.
4. the image size in the FPGA version is fixed at 512 X 512. However, the image size in
the VLSI version can be either 1024 X 1024 or 512 x 512.
In the VHDL models, the architecture has been partitioned according to the functionality of
the MIP shown in Figure 5.1. These functional blocks will be suitable for applying different
technologies with the same architecture through synthesis or manual conversion without
modifying the architecture.
38
Ports I/O Description
RESET I SYSTEM RESET, active high
BCLK I EISA bus clock
j BALE I Bus address available, active high
LA(23:17) I Base address bus
SA(19:0) I Segment address bus
SDJn(15:0) I Data input bus*
IORn I Registers read, active low
IOWn I Registers write, active low
MEMRn I Memory read, active low
MEMWn I Memory/Mask write, active low
SD.out(15:0) 0 Data output bus*
MEMCS16n 0 8/16 data transmit mode, active low
Frame(l:0) I Applied frame size mode
Table 5.1: MIP Entity
5.2 Behavioral Model of the MIP
Both behavioral and structural models of the MIP emulate the MIP system described in the
previous chapter. A testbench should be provided in VHDL to test a model. Unfortunately,
the testbench requires the file I/O capability which is not implemented in our design tool,
Mentor Graphics' system 1076 version 7.0. In order to read the system's commands, input
images, and store the output results, we used the BLM model of PC BUS written by Jeff
Hanzlik as the testbench. Figure 5.2 shows the schematic in which the BLM model of PC
BUS and the VHDL model of the MIP are connected by Mentor
Graphics' Neted. The I/O
ports in the entity are shown in table 5.1. It should be realized that the SDJn and SD_out
ports are actually a bi-directional bus SD on EISA bus. Since the system 1076 version 7.0
does not support the INOUT port type, we decided to split the SD bus into SDJn and
SD_out, and connected these two ports outside the VHDL models. The port Frame is used
to configure the image size which can be smaller than 512 X 512 for the simulation purpose.
The actual hardware does not include this port.
The design goal of MIP's behavioral model is to obtain simulation results to compare
with the design specifications of the MIP. The processes in the model are designed based
on:
1. the sources and destinations of the system commands in table 4.11. The processes
39
r--^ ,_^ C
r-t<Z> CD CD
* 1
t COO) LD c c CO
lu*:lu C\J-H ^H ZS.CO C CC_)
cn_i_i ^-"r w SIS 3CD2I
UJLJCE crcr a LULU OOLU
CCDDQQ _ICO CO 2:21 111 I'y
(mcc\b com
CD
LU
^
az 1
0 LU
H1 ^
> CT
cr CD
X 0 U_
LU s -C r~l
CD P in 0 U1 -< O _4 \1
Q_
,
CO O) Z
1 2C C\J * . c c c C UJ
z
CE
cr
u.
' UJ _l _ 1 2 cr 3 tn.
5- co 0 a
^~ UJ CD GC
cr ex
_J CO
0 *
CO
z
u.
z
c
-
c
1
1
S u. _ c c c
1
_l _ r ~ _ c c 2 cr cc
uj 0 a -i CD 0 3 oc C c
CO CD oc z - *- cr
UJ CO O) lO Uj u. 0
DC rvj
cr cr
_J CO
_ Z
?
CO
c
z z
u.
z
X ~H
=>l JO
COCO co a
D3 3 3
CD CD CD _o
UU CJ O
Q-OL a. a
Figure 5.2: MIP Behavioral Model with PC BUS
40
are: mip_IOW_process for register write, mip_MEMW.process for memory and mask
write, and mip_SD_output_process for register and memory read.
2. the morphological operation. The processes are mip_MAP_process for all of the image
processing operations and mip_pipeline_process for the status signals of pipelined-
processing.
3. the bi-directional signal emulation. The processes are mip.SDJn and mip.SD.out.
4. the bus I/O conversion. "Convert" at the end of the process'name is used to indicate
the conversion feature of the process.
5.2.1 SD Bus Emulation
The SD is a bi-directional tri-state data bus in the EISA bus protocol. A port connects with
SD through a pair of tri-state buffers (the behavioral model of the tri-state buffer and the
resolving function are discussed in chapter 7). In a fully implemented 1076 VHDL system,
this port should be defined as a resolved tri-state signal with a INOUT port type. In this
version, the ports SDJn and SD_out are used to mimic the SD bus for input and output
respectively. As shown in the schematic, these two ports are connected outside the VHDL
model and resolved by PCBUS.BLM. The SD_BUF_IN is the uni-direction buffer for SDJn.
It is turned on when either the IOWn or the MEMWn is active. On the other hand, the
SD_BUF_OUTis the uni-direction buffer for SD_out. It is turned on when either the IORn,
or the MEMRn is active.
518 mip_sd_in:
519 PROCESS
520 BEGIN
521 wait on SD_in, IOWn, MEMWn, RESET;
522 if I0Wn='O' or MEMWn='0' then
523 SD_BUF_IN <= SD.in;
524 else SD_BUF_IN <=
"ZZZZZZZZZZZZZZZ"
;
525 end if;
526 END PROCESS mip_sd_in;
527 mip_sd_out:
528 PROCESS
529 BEGIN
530 wait on SD.BUF.OUT, IORn, MEMRn, RESET;
531 if I0Rn='O' then
41
532 SD.out <= transport SD.BUF.OUT after IOR.DELAY;
533 els if MEMRn='0' then
534 SD.out <= transport SD.BUF.OUT after MEMR.DELAY;
535 else
536 SD.out <= "ZZZZZZZZZZZZZZZ";
537 end if;
538 END PROCESS mip.sd.out ;
5.2.2 Bus to Integer Conversion
Ideally the on-board memory can be modeled by using the qsim_state_vector for the address
and content signals. The qsimstate is a non-standard type defined by Mentor Graphics.
There are four logic states in qsim_state: '0' and T for logic 0 and 1, 'X' for unknown state,
and
'Z' for hi-impedance state. The qsimstate-vector is a type which is an unconstrained ar
ray of qsim_state. The data structure of the memory requires an array of qsim_state_vector.
Unfortunately the two-dimensional array type is not supported by system 1076 version 7.0.
Therefore, the data structure of the on-board memory and mask registers must use an one
dimensional integer array. The address signals, LA and SA, are converted into integers as
the index of the memory array while the data signals, SDJn and SD.out, are converted into
integers as the memory contents.
The following process, mip_SD_read_and_convert is one of the converting processes. The
process converts the vector, SD-BUFJN, into an integer signal, SDJN_REG, whenever an
event occurs on the SD_BUFJN.
585 mip.SD.read.and.convert :
586 PROCESS
587 VARIABLE temp.value: integer;
588 VARIABLE temp.signal: qsim_state_vector(8 downto 0);
589 BEGIN
590 wait on SD.BUF.IN;
591 for i in temp.signal 'LENGTH-1 downto 0 LOOP
592 temp.signal (i) := SD.BUF.IN(i) ;
593 end loop;
594 in_gen(temp_signal, temp.value) ;
595 SD_IN_REG <= temp.value;
596 END PROCESS mip.SD.read.and.convert;
The in_gen procedure converts a qsim_state_vector into an integer.
337 PROCEDURE in.gen (input : IN qsim.state.vector;
42
338 value : OUT integer) IS
339 VARIABLE temp.i: integer;
340 BEGIN
341 temp:=0;
342 FOR i IN input 'LENGTH- 1 DOWNTO 0 LOOP
343 CASE (input (i)) IS
344 WHEN ' 1' =>
345 temp := temp*2+l;
346 WHEN '0' =>
347 temp := temp*2;
348 WHEN 'X' | 'Z' =>
349 temp := UNKNOWN;
350 EXIT;
351 END CASE;
352 END LOOP;
353 value :=temp;
354 END in.gen;
For the 9-bit signed integers in the model, the procedure out_gen converts them into 16-bit
qsim_state_vectors.
362 PROCEDURE out.gen (value : IN integer;
363 output : OUT qsim_state_vector(15 DOWNTO 0)) IS
364 VARIABLE i, choice, temp: integer;
365 BEGIN
366 temp := value;
367 IF (temp /= UNKNOWN) THEN
368 IF (temp < 0) THEN
369 temp := temp + 2**(W0RD_LENGTH) ;
370 END IF;
371 FOR i IN 0 TO 15 LOOP
372 choice := temp mod 2;
373 CASE (choice) IS
374 WHEN 1 =>
375 output (i) := '1' ;
376 WHEN 0 =>
377 output (i) := '0' ;
378 WHEN OTHERS =>
379 NULL;
380 END CASE;
381 temp := temp / 2;
382 END LOOP;
383 ELSE
384 FOR i IN 0 TO 15 LOOP
385 output(i) :=
'X'
;
386 END LOOP;
387 END IF;
43
388 END out.gen;
5.2.3 8/16 Bit Data Transfer
The data transfer between the host computer and the MIP board can be either 8 bits or 16
bits. The signal MECSl6n is used to indicate the 16-bit data transfer when it is low. The
process mip_MEMCSl6n_process is used to handle the situation.
608 mip_MEMCS16n_process:
609 PROCESS
610 BEGIN
611 wait on BALE, BCLK until BALE='0' and BCLK='0' ;
612 MEMCS16n <= transport '1' after MEMCS.DELAY;
613 wait on MEMRn, MEMWn;
614 if MEMWn='0' or MEMRn='0' then
615 MEMCS16n <= transport '0' after MEMCS.DELAY;
616 end if;
617 END PROCESS mip_MEMCS16n_process;
5.2.4 LA Latching
Each I/O cycle on the PC bus starts when the LA signal becomes valid, which is indicated
by the level of BALE as shown in figure 4.3. The following process, mip.adr.process, latches
the LA address whenever the BALE is '1'.
618 mip.adr.process:
619 PROCESS
620 variable address.temp : integer;
621 variable data.temp: integer;
622 BEGIN
623 wait on BALE until BALE='l';
624 ADDRESS.REG <= LA_REG*2**19;
mapping address (23:19)
625 END PROCESS mip.adr.process;
5.2.5 I/O Transfer
After the LA address is valid, any of the four I/O operations ( defined in chapter 4.4) can
be initiated by the IORn, IOWn, MEMRn, or MEMWn signals. In addition, the RESET
signal is used to reset the MIP. The detailed timing information can be found in figure 4.4
and 4.3.
44
Register Write
The following process is used to either transfer a new value from the PC bus to an on
board input register or to reset the registers. Multiple wait-on statements are included
in the process. The process is first invoked by IOWnh event to generate the address. It
is then suspended until the new SD value is available by the converting process shown in
section 5.2.2. The case statement is used to update the content of the address specified by
LA and SA. The MAP.START, triggered by address select, invokes the mipJMAP.process.
626 mip.IOW.process :
627 PROCESS
628 variable data.temp : integer;
629 variable address.temp : integer;
630 BEGIN
631 wait on RESET, IOWn until RESET*' 1' or I0Wn='O';
632 if RESET='l' then
633 REG.30F.IN <= 0; disable memory write, pc adr=0h
634 REG.30E.IN <= 0; mask load, select memory 0
635 REG.30D.IN <= 0; don't select any memory for any bus
636 REG.30C.IN <= REG.30C.IN mod 2; keep the MAP op only (bit 0)
637 REG.30B.IN <= 1 ; reset REGS.STARTn
638 elsif IOWn='0' then
639 assert FALSE
640 report "write to
registers"
641 severity NOTE;
642 address.temp := ADDRESS.REG + SA.REG;
643 wait on SD.IN.REG;
644 data.temp := SD.IN.REG;
645 case (address.temp) is
646 when ADR.3Q0.IN =>
647 REG.300.IN <= data.temp;
656 when ADR.30B.IN =>
657 MAP.START <= not MAP.START;
658 REG.30B.IN <= data.temp;
667 when others =>
668 assert FALSE
559 report "non-exist input register
address"
670 severity WARNING;
672 end case;
673 end if;
674 END PROCESS mip.IOW.process;
45
Memory or Mask Write
The address generation and data latch of this process is similar to 5.2.5. However, the
process is sensitive to MEMWn instead of IOWn. The data latched in this process will
be transferred to the mip_MAP_process described in 5.2.6. Before starting memory/mask
write, the base address for an image or a mask stored in the PC memory must be loaded into
the register, REG-30FJN. The latched base address in REG-30FJN is compared with the
specified base address in LA.REG. The data will not be transferred if the latched address
is different from the specified address.
675 mip.MEMW.process :
676 PROCESS
677 variable address.temp: integer;
678 BEGIN
679 wait on MEMWn;
680 if MEMWn='0' and MEMWn 'EVENT then
681 address.temp := ADDRESS.REG+ SA.REG;
682 assert FALSE
683 report "memory/mask
load"
684 severity NOTE;
685 wait on SD.IN.REG;
686 MEMW.BUFFER <= SD.IN.REG;
687 if (extract_bits(REG_30E_IN,5,5)/=MASK_LOAD) then
688 RAM.ADDRESS <= extract.bits(address.temp, 18,1) ;
689 assert FALSE
690 report "memory
loading"
691 severity NOTE;
692 else
693 assert FALSE
694 report "mask
loading"
695 severity NOTE;
696 end if;
697 if (LA.REG = extract_bits(REG_30F_IN,4,0)) then
698 MEMORY.MASK.LOAD
<= not MEMORY.MASK.LOAD;
699 else
700 assert FALSE
70i report "non-existing memory
address"
702 severity WARNING;
703 end if;
704 end if;
705 END PROCESS mip.MEMW.process;
46
Register or Memory Read
In the system 1076 version 7.0, a multi-driven signal is not detected by the system during
either the compiling time or the run time. When the situation occurs, the value on the multi-
driven signal will be overwritten by the newest signal value. Therefore, special attention
is required to resolve a multi-driven signal. In this case, the memory-read and register-
read are both processed by mip_SD_output_process to avoid a multi-driven SD.out. The
process is invoked when either IORn or MEMRn is '0'. If the IORn is '0', the register
value in the address defined by address.temp is passed to SD.out. If the MEMRn is '0', the
MEMORY-READ signal will trigger the mip.MAP.process to pass the value from memory
to SD.out.
707 mip.SD.output.process:
708 PROCESS
709 variable address.temp : integer;
710 BEGIN
711 wait on IORn, MEMRn until I0Rn='0' or MEMRn='0';
712 address.temp := ADDRESS.REG + SA.REG;
713 if I0Rn='O' then
714 assert FALSE
715 report "read from
registers"
716 severity NOTE;
717 case (address.temp) is
718 when ADR.300.0UT =>
719 SD.0UT.REG <= REG.300.0UT;
734 when others =>
735 assert FALSE
736 report "non-exist output register
address"
737 severity WARNING;
738 should show a warning for illegal register address
739 end case;
740 els if
MEMRn='0' then
741 address.temp := ADDRESS.REG + SA.REG;
742 assert FALSE
743 report "memory
read"
744 severity NOTE;
745 RAM.ADDRESS <= extract.bits (address.temp, 18,1) ;
746 if (LA.REG = extract_bits(REG_30F_IN,4,0)) then
747 MEMORY.READ <= not MEM0RY.READ;
748 wait on BUFFER.READY ;
749 SD.0UT.REG <= MEMR.BUFFER;
750 else
751 assert FALSE
47
752 report "wrong memory address"
753 severity WARNING;
754 end if;
755 end if;
756 END PROCESS mip.SD.output.process;
5.2.6 Mathematical Operations
The mip.MAP.process performs the functions of ALUl, ALU2, and MU, as well as the
memory transfer. The functions of ALUl, ALU2 and MU have been described in chapter
4. The process performs the pipelined operations differently from the real circuit. In the
hardware design the input image is computed stage by stage, starting from ALUl and
ending at the Volume Adder. The sequence can be found in figure 4.1. The intermediate
results between stages are stored in temporary buffers. These temporary buffers provide
easy access to examine the partial result between the stages for the debugging purpose.
The process can be invoked by four signals: MAP.START'from mip_IOW_process when
the register 00030B is selected, MEMORYMASK-LOAD from mip.MEMW.process when
the system command MEMW is issued, MEMORY-READ from sd.output.process when
the command MEMR is issued, and RESET from the PC bus when the whole system is
reset. In the following subsections, we will describe mainly the system configuration. The
operations of each stage will be presented, but the implementation of the VHDL model for
the corresponding physical blocks will be discussed in chapter 6.
Configuration and Operations of the MIP
The operating procedures have been explained in the section 4.5. The process mip.MAP.process
checks the memory write enable status, examines the setup of ALUs and MAP, and estab
lishes the local bus configurations. The error messages will be given if the configuration is
incorrect.
As shown in stage A of figure 4.5, the memory write status must be confirmed before
writing to an on-board memory or the mask.
783 if (extract_bits(REG_30F_IN,6,6)=l) then
-- memory&mask write enable
948 else
949 assert FALSE
950 report "memory/mask write disable: check
30F"
48
951 severity ERROR;
952 end if;
The implementation of stage B in figure 4.5 is accomplished by retrieving the constants
for ALUs and decoding the op-codes for the ALUs, the MAP, and the Volume Adder.
825 alul.const := extract.bits(REG_306_IN,0,0)*2**8 + REG.305.IN;
826 if (alul.const > MAXNUM) then
827 alul.const := alul.const - 2**(W0RD_LENGTH) ;
828 end if;
829 alul.op := extract.bits(REG.306.IN,7,4) ;
849 map.op := extract_bits(REG_30C_IN,0,0) ;
856 alu2_const := extract_bits(REG_308_IN,0,0)*2**8 + REG.307.IN;
857 if (alu2_const > MAXNUM) then
858 alu2_const := alu2_const - 2**(W0RD_LENGTH) ;
859 end if;
860 alu2_op := extract.bits(REG.308.IN,7,4) ;
861 alu2_select := extract_bits(REG_308_IN,3,2) ;
937 volume.op := extract_bits(REG_300_IN,0,0) ;
In the stage C of figure 4.5, memories are connected with the local buses, XI, X2, and Y.
The connections are made by copying the contents of the memory into buffers. The better
way to emulate the connection would be accessing the memory by a pointer. This can be
accomplished by the access type in a fully implemented VHDL. If a memory is connected
with XI bus as the input source, the contents of the memory will be copied to the XI buffer
(xl.buffer).
786 xlbus_cnt:=0; x2bus_cnt :=0;
787 bus_sel3:=extract_bits(REG_30D_IN,7,6) ;
788 case (bus_sel3) is
789 when 0 => xl.buffer := memory3; xlbus_cnt:= xlbus_cnt+l;
790 when 1 => x2_buffer := memory3; x2bus_cnt:= x2bus_cnt+l;
791 when others => null;
792 end case;
811 assert not (xlbus_cnt<l)
812 report "no memory connects to xl.bus: check
30D"
813 severity ERROR;
814 assert not (xlbus_cnt>l)
815 report "more then one memory connect to xl.bus: check
30D"
816 severity ERROR;
49
817 assert not (x2bus_cnt<l)
818 report "no memory connects to x2_bus : check 30D"
819 severity WARNING;
820 assert not (x2bus_cnt>l)
821 report "more then one memory connected to x2_bus : check 30D"
822 severity ERROR;
913 ybus_cnt:=0;
914 if (bus_sel3=2) then
915 memory3 := y.buffer;
916 ybus_cnt:= ybus_cnt+l;
926 end if;
927 assert not(ybus_cnt<l)
928 report "no memory connects to y.bus: check
30D"
929 severity ERROR;
930 assert not(ybus_cnt>l)
931 report "more then one memory connects to y.bus: check
30D"
932 severity ERROR;
A MIP user should be cautious not to connect X2 bus to both ALUl and ALU2. Although
this erroneous setup is not prevented by the hardware, our VHDL model provides the check
with the following statements.
863 status x2 bus status check
864 case (alul.op) is
865 when C0PY1B |C0PY2B| ADDAC1 | ADDAC2|SUBAC1 |SUBAC2=>
866 alul_x2:=FALSE;
867 when others => alul_x2:=TRUE;
868 end case;
869
870 case (alu2_select) is
871 when ALU2.CT0A => -- in.a = x2_bus, in.b = map.xout
872 case (alu2_op) is
873 when C0PY1B|C0PY2B => alul_x2:=FALSE;
874 when others => alul_x2:=TRUE;
875 end case;
876 when ALU2.CT0B => in.a = map.yout, in.b = x2_bus
877 case (alu2_op) is
878 when MIN1AB |MAX1AB|C0PY1B|C0PY2B|ADDAB1 |
879 ADDAB2|SUBAB1|SUBAB2 =>
880 alu2_x2:=TRUE;
881 when others => alu2_x2:=FALSE;
882 end case;
883 when ALU2.CAB => ~ in.a = x2_bus, in.b = x2.bus
884 alu2_x2 : =TRUE ;
50
885 when others => NULL; -- pseudo option
886 end case;
887 assert not( alul_x2 and alu2_x2)
888 report "bus conflict between alul, alu2, x2 bus"
889 severity ERROR;
The stage C in figure 4.5 shows that the image is processed by the ALUl, the MAP, the
ALU2, and the Volume Adder. These operations are executed by the following statements.
844 alul.process (xl.buffer , x2_buffer , alul.const , alul.op , alu.buffer) ;
850 map.process (alu.buffer,mask,map_op,APPLIED_ROW_SIZE,map_buffer) ;
895 case (alu2_select) is
896 when ALU2.N0RM => -- in.a = map.yout, in.b = map.xout
897 alu2_process(map_buffer, alu.buffer, alu2_const,
898 alu2_op,alu2_max_min, y.buffer) ;
909 end case;
910 REG.307.0UT <= extract.bits(alu2_max_min, 7, 0) ;
911 REG.308.0UT <= extract_bits(alu2_max_min,8,8) ;
938 volume.adder.process (y.buffer , volume.op , volume.regO ,
939 volume.regl) ;
940 REG.300.0UT <= extract.bits(volume.regO, 7, 0) ;
944 REG.304.0UT <= extract.bits(volume.regl , 15,8) ;
Memory Write
After the status of memory write enable is checked, either the mask or one of the on board
memories can be written, provided that the configuration is correct.
962 if (extract_bits(REG_30E_IN,5,5)/=MASK_L0AD) then
963 memory or mask?
964 case (extract_bits(REG_30E_IN,4,3)) is -- memory select
965 when 3=> memory3(RAM_ADDRESS) := memory.temp;
969 when others => should show a warning;
970 assert FALSE
971 report "illegal memory chip select: check 30E"
972 severity ERROR;
973 end case;
974 else
975 for i in 0 to mask'LENGTH-2 loop MASK SHIFT LOAD
976 mask(i) :=mask(i+l);
51
977 end loop;
978 mask (mask' LENGTH- 1) := memory.temp;
979 end if;
Memory Read
In stage D shown in figure 4.5, the XI bus is connected with the memory from which the
image is read by the host computer. The selected memory in register 00030E must match
the selection of XI bus connection in register 00030D.
988 if (extract_bits(REG_30D_IN,3,0)/2=extract_bits(REG_30E_IN,4,3)) then
989 case (extract_bits(REG_30D_IN,3,0)) is -- memory select
"0 when 8=> memory.temp := memory3 (RAM.ADDRESS ) ;
994 when others => NULL;
995 end case;
996 else
997 assert FALSE
998 report "unmatched controller & memory: check 30D and 30E"
999 severity WARNING;
1000 memory.temp := UNKNOWN;
1001 end if;
5.3 MIP Structural Model
The behavioral model discussed above clearly illustrates the functionality of the MIP. How
ever, it does not carry architectural information. In order to see how the functionality of
the MIP is implemented in hardware, we must use the structural model, which is shown
in figure 5.3. Each block except PCBUS in the figure is the behavioral model of the cor
responding circuit. PCBUS is a BLM model written by Jeff Hanzlik to emulate the host
computer. Since system 1076 version 7.0 dose not support file input and output feature, a
VHDL testbench can not be written in a meaningful way. Therefore, we adopted the BLM
model of PCBUS into our structural model for simulation purpose. The VHDL models of
other components will be discussed in the following four chapters. However, we will discuss
the timing and overview of the MIP in this section.
52
"f t,
5
10HI
IS < 51 1 CI
IB'BI I
m\T~mf>
SI
i.'.ixniE S H3
"3 ";"-
i .1
Awanmu I
[f.Blino-jrt. Ie..jq ^* |
ITS. IB'W ^^ *"
'3 BB V
11 "i
I
'. L "
BlBJin-|nl
ill
73-
re
JBg 53S3 35SSs
IB'ET
[D:E)H3lTiyHlS
5 =
ss5
_
ti v, ii vi
a
153 S = 5 I* ffk
Ol BED UU DOI1
: d tn _ kk
m
3 Si
3 5H
O n t ti
c ' Pi:
(J
re
so
UJ
o
54
I
Figure 5.3: MIP Structural Model with PC BUS
53
5.3.1 Timing of the MIP
Figure 5.4 shows the general timing of the MIP (FPGA version) provided by Jens Roden-
berg. ([9]) The address and data for a pixel are represented by the location of the pixel in
row, column format. Timing is shown for one complete image pass,with the relevant pixel
locations and control pulses shown in boldface. The last part of a previous image and the
beginning of an images following the complete image is shown. All timing waveforms found
on the timing diagram are described below. Most of the description are directly from Jens
Rodenberg [9]:
START-MEM(Xl)i originates from the Master Controller. There could be one or more
START-MEM(Xl) signals with identical waveform. The number of signals depends
on the bus mode. When the bus mode is 1, only one START-MEM signal is generated,
i.e, only one memory is selected. The output of the selected memory is connected to
the A input of ALUl through the XI bus (which is controlled by the Xl-BUSSEln).
When the bus mode is 0, two START-MEM signals are generated. The outputs of
selected memories are connected to both A and B inputs ofALUl through XI and X2
buses respectively. On the next rising clock edge, the memory address counter in each
selected MEM-CONTROL chip will start counting. The newly generated addresses
(XI Addr) will be used by the respective memories to output data (XI data) onto the
appropriate data buses during the same clock that addresses are generated. XI Addr
and XI data denote the addresses and data associated with the memories connected
with ALUl.
XI Addr. is the address generated by the MEM-CONTROL chip controlling the memory
designated to drive the XI-Bus when the bus mode is 1. When the bus mode is 0, XI
Addr also includes the address generated by by the MEM-CONTROL chip controlling
the memory designated to drive the X2-Bus.
XI Data: is the data contained in the selected memories associated with the ALUl at XI
Addr. The data is clocked into the input flip-flops of ALUl one clock cycle after the
address is generated.
INIT ALUl: instructs ALUl to load its next instruction upon the next rising clock edge.
This initializes ALUl at the same time that the first valid image pixel (0,0) is its
54
ON
1 o O on ON ON oo 'enJ M< i >o lO lO tr * Tf -* t
*
1
o Jx o o o OO
o
OO
o
OO
o
oo
o
CO
o
oo
o
r*< oo
o
OO
o
oo
L~, t-tf n [xj bxz >< ujftH >* UJ On UJ UJ UJo o o O o\
^ "1 m VJ-) lO 't --t t Sflr
* <r t
m , o p-C o o OO 00 oo oo oo oo1 o o o o o o oL >< ^ t-J lx{& UJ r^ UJUJ yy; UJ ON UJUJ UJ
1 o m o O o ON On ON OOj "1 >o w-i V. <n t ^t 'fr L^. "t ^r t^ ,- p-e o ^H oo OO 00 CO OO oo CO oo
i o o o oL >< o ,^Jtxj O' TON UJ UyJSD'i*< en- UJ UJ UJ UJ
o o O o a- On ON
j o U-) ^ /-> u-i ^f "t TT LgJ t -t t
- i
X1
oo oo CO OO CO
o O o o O o o
L > ><r E i*^, lJ O1 'OO UJ f&>>< ri UJ ONXT UJUJ UJ
O o o O o ON On ON
i o SSfcJ yn VO VI <*-) *n
^- L^J * "* t
, . r* T-l oo CO 00 oo OO oo oo oo OO
o
L LgJ >o LgJJ**iLJ UJ UyJ }H-UyJO LyJ On LjaJ UJ UJ
o O o On ON C7\
"o LaJ "1 lo >0 v~) f *fr ^r 5? tf T* *t
, OO CO oo OO OOr o o o O o o o oL LgJ uj rH rn rn
UJ UJ UJUJ UJ
rn ri DO
u
i
H
co
3 a P 9 I 1 I
o U.
5
3
a,
o
< a
J5 S3
U
o
<
H
co
a ts
9 Q
.a
u
00
5 U
<u
O
3 fi- x -a
1 1 a
2h'
Q ^
X t-1 i
o
H
co
3
& P ^
2
<n S3
D
^
H
co
H
CO
Figure 5.4: MIP Timing Chart
55
XS
UJ
I
H
co
.- - 8 '
< Q ^
X X
H'
5
3
&
3
O
P
ss
n.
o
2 ffi
U
o
oi
%
H
co
a
3
o
"3 "S "2 3 CM
* 9 & xa & B fa o. s
3 1 1 3 G S ^ 2:
x x a.
O
H
co
1 B.
8.
O ol
3
a
<
CO
H
CO
Figure 5.5: MIP Timing Chart (continued)
56
operand.
ALUl Operand: is the output of the input flip-flops of ALUl. This is the pixel being
operated on by ALUl during any given clock period. The pixel will be clocked into
ALUl's output flip-flops on the next rising clock edge.
ALUl Output: is the output of the output flip-flops of ALUl. The pixel is clocked into the
input of the MAP on the next rising clock edge.
First MAP Operand: The operand of the first adder in the MAP.
Start Blank Counter, instructs the blank counter to start counting on the next rising clock
edge. In the VLSI version, the Start Blank Counter occurs one clock period later then
that in FPGA version. The reason is explained in section B.l.3.
START PROC: informs the MAP that the first valid image pixel will be in the target
pixel position upon the next rising clock edge. The MAP uses this signal to latch
the window values and the desired morphological operation for next image processing
operation.
Blank counter count: is used to generate the row and column blanking signals.
Target pixel: is the pixel being operated on by the adder in the middle of the MAP, which
also corresponds to the middle of the window.
Cl operand: The pixels in the first level of the comparison tree. There are 49 pixels that
are inputs to the first comparison tree level since all 49 adders in the MAP perform
a simultaneous addition of a potential result.
CIS operand: The pixels in the last level of the comparison tree. The result of the compar
ison, which is the output value of the MAP, gets clocked into the output flip-flops of
the MAP.
MAP output: is the output of the output flip-flops of the MAP.
START-MEM(XI): originates from the Master Controller. When bus mode is 1, one
STARTJAEM'is generated, and the output of the selected memory is connected to the
C input of ALU2 through the X2 bus (which is controlled by the X2-BUS.SEln). On
57
the next rising clock edge, the memory address counter in selected MEM.CONTROL
chip will start counting. The newly generated address (X2 Addr) will be used by
the memory to output data (X2 data) onto the X2 data bus during the same clock
that address is generated. X2 Addr and X2 data denote the address and data associ
ated with the memory which is connected with ALU2. When the bus mode is 0, no
START-MEM(X2) signal is generated.
X2 Addr. is the address generated by the MEM.CONTROL chip controlling the memory
designated to drive the X2 Bus when the bus mode is 1. When the bus mode is 0,
this address will not be generated.
X2 Data: is the data contained in the selected memory associated with the ALU2 at X2
Addr when the bus mode is 1. When the bus mode is 0, this data will not be generated.
The data is clocked into the input flip-flops of ALU2 one clock cycle after the address
is generated.
INIT ALU2: instructs ALU2 to load its next instruction upon the next rising clock edge.
This initializes ALU2 at the same time that the first valid image pixel (0,0) is its
operand.
STOP ALU2: informs ALU2 that the last valid pixel of the image being processed is its
output on the next rising clock edge. This is used to capture the maximum or the
minimum value of the pixels in processed image, if the ALU operation was selected.
ALU2 Operand: is the output of the input flip-flops of ALUl. This is the pixel being
operated on by ALU2 during any given clock period. The pixel will be clocked into
ALU2's output flip-flops on the next rising clock edge.
ALU2 Output: is the output from the output flip-flops of ALU2. The pixel is clocked into
the input of the Volume Adder on the next rising clock edge. This is also the final
output which will be written into the memory selected to contain the output image.
START MEM(Y): originates from the Master Controller. It goes to the MEM.CONTROL
chip controlling the memory selected to receive the final output of a processing op
eration. On the next rising clock edge, the memory address counter in the selected
58
MEM.CONTROL chip will start counting. The newly generated addresses (Y Addr)
will be used by the memory selected to store the resultant pixels from ALU2.
Y Addr: is the address generated by the MEM.CONTROL chip controlling the memory
designated to receive the resultant pixel from ALU2.
5.3.2 Overview of the MIP
The structural model of the MIP in figure 5.4 can be thought of as the realization of the
MIP Data Path shown in figure 4.1. Although we have described in detail the MIP system
and the individual components in Chapter 4, the inter-relations between the control units
and the processing units were not discussed there. The purpose was to clearly illustrate
the functionality of the MIP from the system point of view. However, it is important to
see how the control units and the processing units work together to process an image as
desired. Therefore, we will describe briefly the inter-relations between these units.
The MIP board is controlled by a host computer, which is emulated by the BLM model,
PCBUS. All of the commands described in table 4.11 are from the host computer. Each
command is associated with an address value and a data value. The address could indicate
either a selected memory location or a register. H the address is for a selected memory
location and the MEMWn or the MEMRn signals are activated, the corresponding Memory
Controller will pass the address directly to the memory. If the address is for a register and
the IOWn or the IORn signals are activated, the register address will be decoded by the
Bus Interface to produce register control signals. These register control signals, denoted
REGSn(13:0) from the Bus Interface, are connected to the Controller, the ALUl, the MAP,
the ALU2, or the Volume Adder. The command from the host computer are sent to the
registers in any of these components via SD bus.
A Memory Controller accepts control signals from both Bus Interface and Master Con
troller. When PC-CS is high, the control signals from the Bus Interface is for the host
computer to access one of the on-board memories. When PC-CS is low, the control signals
from the Controller is for processing units to access the on-board memories.
The memory bank provides input images for ALUl or ALU2 via XI Bus and X2 Bus.
The output image ofALUl is sent to MAP to be operated through either dilation or erosion.
The window used by MAP for the dilation or erosion is down loaded from the host computer
59
through SD Bus. The original and resultant images of MAP then enter ALU2. The third
input image for ALU2 is from one of the memories. Two of the three input images will be
manipulated by ALU2, and the processed image is sent back to the selected memory and
to the Volume Adder. The Volume Adder will sum either the original or squared values of
all pixels in the image and send the result back to the host computer via SD bus.
5.4 Simulation
The simulation for both the behavioral and the structural models of the MIP has been
performed using a 32 X 32 image. The image file is in a ASCII format. Each pixel is
represented by a four characters string in hexadecimal and delimited by a space. Each fine
contains 16 pixels. A 32 x 32 image was obtained through several steps. First, a graph was
drawn by using Microsoft's Painbrush. In order to obtain required image size, the graph was
scanned through Xerox 7650 Scanner. The output from the scanner was in TIFF format,
which was then converted into IMG format through a C program written by Yidong Chen.
The IMG format is the display protocol used by PC to display the image. Finally, the IMG
format was converted into ASCII format through another C program. A partial 32 x 32
image in ASCII format is shown in figure 5.6.
0100 0100 0100 0100 0100 0100 0100 0100 0100 0100 0100 0100 0100 0100 0100 0100
0100 0100 0100 0100 0100 0100 0100 0100 0100 0100 0100 0100 0100 0100 0100 0100
0100 0100 0100 0100 0100 0100 0100 0100 0100 0100 0100 0100 0100 0100 0100 0100
0100 0100 0100 00FF 001E 0022 0022 00FF 00FF 00FF 00FF 00FF O0FF 00FF 00FF 00FF
00FF 00FF 00FF 00FF 0017 0017 0017 001F 001F 005B 005B 00FF 00FF 0100 0100 0100
0100 0100 0100 001E 001D 001E 0022 001F 00FF 00FF 00FF 00FF 00FF 00FF 00FF 00FF
00FF 00FF 00FF 0017 0017 0017 0017 0017 001F 001F 00FF 00FF 00FF 0100 0100 0100
0100 0100 0100 001D 001D 001D 001E 001F 001A 00FF 00FF 00FF 00FF 00FF 00FF 00FF
00FF 00FF 00D5 0047 0017 0017 0017 001F 001F OOFF OOFF OOFF OOFF 0100 0100 0100
Figure 5.6: A Partial 32 X 32 image in ASCII format
The identical output images were obtained from simulating both models by using the
same input ASCII image. The resultant image was verified by running MIP.bin, a program
written by Jens Rodenberg to exam the operations in MAP. Figure 5.7 illustrates the
simulation result. The figure in top-left corner is the original image. The image was
inverted for display purpose. Therefore, it should be realized that the dark area contains
high-signal pixels while bright area has low-signal pixels. The original image is then eroded
60
consecutively by applying the window shown in table 5.2. The image in top-right corner i
obtained after first erosion. The image size is smaller, and the low-signal area is expanded,
i.e, the ring is wider. The images in lower-left and lower-right corners are obtained aft
second and third erosion, respectively.
s
er
Figure 5.7: Resultant Image, top-left :original image, top-right: first erosion, bottom right:
second erosion, bottom left: third erosion
We noticed that simulation time for behavioral model is much shorter than that of
structural model. For example, it took 16 minutes to complete simulation for a 32 X 32
image on Apollo 3500 workstation. The simulation time consumed by structural model
depends on the size of the FIFO used. On the same workstation, it took 136 minutes if
32 x 7 FIFO is used, or 341 minutes if 512 x 7 FIFO is used. Assuming the simulation
61
0 0 0 0 0 0 0
0 0 0 0 0 0 0
0 0 o 0
0 0 o o
0 0 o 0
0 0 0 0 0 0 0
0 0 0 0 0 0 0
Table 5.2: The Window Array
time is linear with the array size, a 512 x 512 image will require 68 hours simulation time
for a behavioral model, or 1455 hours (61 days) simulation time for a structural model!
Therefore, it is impractical to simulate the structural model for a 512 x 512 image due to
the simulation time required. However, it is sufficient to simulate the MIP based on 32 x 32
image since all the boundary conditions between 32 x 32 images and 512 x 512 images are
identical. The major reason for slow simulation of structural model is due to its enormous
number of signals, most ofwhich require event scheduling during simulation. The simulation
of a 512 x 512 natural image in Figure 5.8 was performed, and the resultant image is shown
in Figure 5.9. We noticed first that the bottom part of the image is processed twice, but
the top part of the image is disappeared. Secondly, the bottom part of the resultant image
is black, i.e, the values of these pixels were not written to the resultant image file. The first
observation is due to the memory access limitation of System 1076 verson 7.0. Although
it is not documented by Mentor Graphics, it seems that the maximum array size that can
be achieved is 217 words in integer type. The confusing point is that the system did not
issue any error message when the array index exceeded the maximum array size. Instead,
it overwrote the first half of the image. The second observation is due to the fact that our
simulation time steps had exceeded the maximum simulation time steps of the system. In
order to confirm the explanations given above, the first half of the image was simulated, the
resultant image is shown on top of Figure 5.10. The bottom of Figure 5.10 is the processed
image of the second half. It is evident that the simulation result is correct.
62
Figure 5.8: Original 8-bit grey scale image
63
Figure 5.9: Resultant image of a complete 512 x 512 image after erosion
64
Figure 5.10: Resultant image of two 512 X 256 images after erosion
Since the BLM model of the PCBUS is to emulate the host computer, a command file for
the PCBUS is required to run the simulation. We will explain in this section the command
file fisted below.
1 !map 32x32
2 WAP1
3 LINES 64
4 IOW 00030f 0044 #write enable+ Address = 200000h
5 IOW 00030e 0000 #memory load, select chipO controller
6 IOW 00030d 0001 #PC -> memoryO
7 PICLD lffffe images/col. txt
65
Configure The Processor
8 IOW 00030e 0020
9 MASKLD 200000
10 OOff OOff OOff OOff OOff OOff OOff
11 OOff OOff OOff OOff OOff OOff OOff
12 OOff OOff 0000 0000 0000 OOff OOff
13 OOff OOff 0000 0000 0000 OOff OOff
14 OOff OOff 0000 0000 0000 OOff OOff
15 OOff OOff OOff OOff OOff OOff OOff
16 OOff OOff OOff OOff OOff OOff OOff
17 #
18 # Setup to process mO and output to ml
19 # ALUl: copy(A)ALU2: copy(B)
20 # map will do a MIN or errosion
21 #
22
23
24
25 IOW 000300 0000 # Sum X in Volume Adder
26 IOW 000305 0000 # alul const = 0
27 IOW 000306 0020 # alul op = copy(A)
28 IOW 000307 0000 # alu2 const = 0
29 IOW 000308 0020 # alu2 op = copy (A)
30 IOW 00030C 0000 # map mode = MIN
31 IOW 00030D 00F8 # M3 : nop M2: nop Ml: Y MO: XI
32 IOW 00030B 0000 # start the image
33 IOW 00030E 0008 # memory select chipl controller
34 IOW 00030D 0002 # Ml -> PC
35 PICRD lffffe images/co2.txt
36 #
37 # read the Volume Adder
38 #
39 ! IOR 000300
40 IOR 000300
41 ! IOR 000301
42 IOR 000301
43 ! IOR 000302
44 IOR 000302
45 ! IOR 000303
46 IOR 000303
47 ! IOR 000304
48 IOR 000304
49 END 123456
50 END
Notice that # is a comment character. Anything after # is ignored. ! is an output text
character, anything after ! is sent to the transcript window of Quicksim. WAP1 in line 2
is used to choose a 7 x 7 mask window and a 32 x 32 image size, while LINES 64 in line 3
66
instructs the model to expect 64 lines of 16 pixels each from the image file.
In line 4, Address Control Register (00030f ) in Bus interface is configured so that write
enable for either memory load or mask load is activated when the address is 200000 hex.
Line 5 then sets up the Segment Register (00030e) by selecting Memory Controller 0 for
memory load operation. The command in line 6 chooses MemoryO to store the original image
from the host computer. Line 7 issues PICLD command so the image in host computer is
loaded into selected memory, i.e, memoryO in this case. Since the PCBUS pre-increments
the address, lffffe instead of 200000 is used as the beginning address of the image stored
in the host computer. After the image is loaded into memoryO, the segment register is
reconfigured for mask load. The values of the mask is from the command file between lines
10 and 16. It should be pointed out again that memory load must proceed mask load.
Otherwise, the buffer between Y bus and the selected memory storing output image will be
open and the output image will be lost.
The commands between line 25 and line 34 configure the processing units accordingly.
Since copy(A) command is issued to both ALUl and ALU2, the image is processed only in
the MAP. For this particular example, the set up of constants for both ALUl and ALU2
are irrelevant since they are not used in the process. In line 30, the erosion for the MAP
is configured. Line 31 indicates that the original image is in memoryO, and the resultant
image will be memory1. Memory2 and memory3 are not used for the moment. Line 33
selects memory controller 1 so that the host computer can access the corresponding memory
for reading the resultant image, which is accomplished by PICRD command in line 35. The
last section reads the value stored in the register 000300 through register 000304 from the
Volume Adder.
67
Chapter 6
Mathematic Units
The functionalities of the ALUl, the MAP, the ALU2 and the Volume Adder have been
discussed in chapter 4. We will discuss in this chapter the implementation for these functions
in VHDL. The operations of the four units are pipelined, i.e., the input image is shifted
through the four stages pixel by pixel. Therefore, the speed of image processing is greatly
improved.
Each of the operation units is described in three parts: a general description of the
functionality, the input/output port description and the major processes in the architecture.
6.1 ALUl
The ALUl is a pre-processor for the MAP. The two inputs of the ALUl are connected with
the on-board memory bank. The operations of the ALU are described in section 4.2.1. The
output pixels are shifted into the MAP and the FIFO.
A unique ALUl circuit is merely used in the FPGA version. In the VLSI version, the
ALUl is replaced by the ALU2 to reduce the manufacturing cost. This substitution was
made because the functionality of the ALUl is a subset of the functionality of the ALU2.
Two of the operations (4x and 5x) provided by the ALU2 are not fully functional in ALUl.
The operations 4x and 5x perform the "copy a" function.
6.1.1 Ports
The following list is the port description of the ALUl:
elk: is the on-board system clock.
68
a: connects with the memory bank through the XI BUS as the first input ot the ALUl.
b: connects with the memory bank through the X2 BUS as the second input of the ALUl.
start_alul: indicates the start of a new image on the next rising edge of the elk and
instructs the ALUl to load the new operation and constant on the next clock.
sd: is the 8-bit input data bus for operations and constants. It is connected with bit (7 to
0) of the SD on the PC bus.
regsln: is an active low signal of register select status. The register address is hex 000305.
regshn: is an active low signal of register select status. The register address is hex 000306.
regenw: indicates write enable status. When it is active, a selected register will latch its
new contents from SD bus.
alul_out is the output of the ALUl.
6.1.2 Processes
The processes alul_aj-ead_and_convert and alul_b_read_and_convert are used to convert the
bus input into an integer. We wiD not discuss the processes further since the similar code
has already discussed in section 5.2.2.
The process load_op_to_buffer latches the new value on the SD bus, sd, to register
aluop_buf , when its address is selected and the write enable, regenw, is active.
180 load_op_to_buffer:
181 PROCESS
182 BEGIN
183 WAIT ON regenw, regshn UNTIL (regenw = >1') AND (regshn
= '0');
184 aluop.buf <= sd;
185 END PROCESS load_op_to_buffer ;
The process load_const_to_buffer latches the new value on the sd to register aluconst_buf ,
when its address is selected and the write enable is active.
186 load_const_to_buffer:
187 PROCESS
188 BEGIN
189 WAIT ON regenw, regsln UNTIL (regenw
= '1') AND (regsln = '0');
190 aluconst.buf <=sd;
191 END PROCESS load_const_to_buffer;
69
The process load_new_const_and_op latches the new constant and operation value from
the register aluop_buf and aluconst_buf to the register aluop and aluconst when the
start_alul is active. The aluop defines the operation of the ALUl while the aluconst
stores the constant value used in some operations.
192 load_new_const_and_op:
193 PROCESS
194 VARIABLE temp.i : integer;
195 BEGIN
196 WAIT ON elk UNTIL (elk = '1') AND (start.alul = 1');
197 temp:=0;
198 FOR i IN 7 DOWNTO 4 LOOP
199 CASE (aluop_buf(i)) IS
200 WHEN ' 1' =>
201 temp := temp*2+l;
202 WHEN '0' =>
203 temp := temp*2;
204 WHEN 'X' I 'Z> =>
205 temp := UNKNOWN;
206 EXIT;
207 END CASE;
208 END LOOP;
209 aluop <= temp;
210 temp:=0;
211 FOR i IN 7 DOWNTO 0 LOOP
212 CASE (aluconst.buf (i)) IS
219 ... same code as 199-207...
220 END CASE;
221 END LOOP;
222 IF (aluop.buf(O) = '1') AND (temp /= UNKNOWN) THEN
223 temp := temp - 256;
224 END IF;
225 aluconst <= temp;
226 END PROCESS load_new_const_and_op;
The process alul .processing models a combinational circuit within the ALUl. Whenever the
operation (aluop), constant (aluconst), input a (alul_a), or input b (alul_b) is changed,
the process will re-evaluate the output value (output).
227 alul_processing:
228 PROCESS
229 VARIABLE op, const, in.a, in.b : integer;
230 BEGIN
231 WAIT ON aluop, aluconst, alul.a.alul.b;
232 op := aluop;
70
233 const := aluconst;
234 in.a := alul.a;
235 in.b := alul.b;
236 output <= alul(in_a, in.b, const, op) ;
237 END PROCESS alul.processing;
The function alul used in alul.processing includes all of the mathematic functions of the
ALUl.
73 FUNCTION alul(a, b, aluconst, aluop: integer)
74 RETURN integer IS
75 VARIABLE output: integer :=UNKN0WN;
76 BEGIN
77 CASE (aluop) IS
78 WHEN MIN1AB I MIN2AB =>
79 IF (compare(a,b,MIN) = TRUE) THEN
80 output := b;
81 ELSE
82 output := a;
83 END IF;
84 WHEN MAX1AB | MAX2AB =>
85 IF (compare(a,b,MAX) = TRUE) THEN
86 output := b;
87 ELSE
88 output := a;
89 END IF;
90 WHEN C0PY1A I C0PY2A =>
91 output := a;
92 WHEN C0PY1B | C0PY2B =>
93 output := b;
94 WHEN ADDAB1 I ADDAB2 =>
95 output := adder(a, b) ;
96 WHEN ADDAC1 | ADDAC2 =>
97 output := adder(a, aluconst) ;
98 WHEN SUBAB1 =>
99 IF (b = MFIN) THEN
100 output
:= MFIN;
101 ELSE
^02 output
:= adder(a, -l*b) ;
103 END IF;
104 WHEN SUBAB2 =>
105 IF (b = MFIN) THEN
106 output
:= a;
107 ELSE
108 output
:= adder (a, -l*b) ;
109 END IF;
HO WHEN SUBAC1 =>
71
ill IF (aluconst = MFIN) THEN
112 output := MFIN;
113 ELSE
114 output := adder (a, -l*aluconst) ;
115 END IF;
116 WHEN SUBAC2 =>
117 IF (aluconst = MFIN) THEN
118 output := a;
119 ELSE
120 output := adder (a, -l*aluconst) ;
121 END IF;
122 WHEN OTHERS =>
123 output := UNKNOWN;
124 END CASE;
125 RETURN output;
126 END alul;
It should be realized that the function alul is based on comparison and addition. These
two operations are also implemented with functions. The function compare returns either
0 or 1 to indicate the relation between the arguments a and b.
45 FUNCTION compare(a,b,max : integer)
46 RETURN integer IS
47 VARIABLE comp : integer;
48 BEGIN
49 IF ((a > b) XOR (max = 0)) THEN
50 comp := FALSE;
51 ELSE
52 comp := TRUE;
53 END IF;
54 RETURN comp;
55 END compare;
The adder function adds two operands and returns the summation. It should be noticed
that no overflow or underflow flag existed in the circuit design. If this situation happens,
the result will be either maximum or minimum.
56 FUNCTION adder (a,b : integer)
57 RETURN integer IS
58 VARIABLE sum : integer;
59 BEGIN
60 IF ((a = UNKNOWN) OR (b = UNKNOWN)) THEN
61 sum := UNKNOWN;
62 ELSIF ((a = MFIN) OR (b = MFIN)) THEN
63 sum := MFIN;
72
64 ELSIF ((a >= 0) AND (b >= 0) AND ((a + b) > MAXNUM)) THEN
65 sum := MAXNUM;
66 ELSIF ((a < 0) AND (b < 0) AND ((a + b) < MINNUM)) THEN
67 sum := MINNUM;
68 ELSE
69 sum := a + b;
70 END IF;
71 RETURN sum;
72 END adder;
The output value, alu.out, is converted from an integer, output, to a bus vector in the
output.processing.
238 output.processing:
239 PROCESS
240 VARIABLE temp, i, choice : integer;
241 BEGIN
242 WAIT ON elk UNTIL clk='l';
243 temp := output;
244 IF (temp < 0) THEN
245 temp := temp + 512;
246 END IF;
247 IF (output /= UNKNOWN) THEN
248 FOR i IN 0 TO 8 LOOP
249 choice := temp mod 2;
250 CASE (choice) IS
251 WHEN 1 =>
252 alul.out(i) <= TRANSPORT
'1' AFTER DELAY;
253 WHEN 0 =>
254 alul.out(i) <= TRANSPORT
'0' AFTER DELAY;
255 WHEN OTHERS =>
256 NULL ;
257 END CASE;
258 temp := temp / 2;
259 END LOOP;
260 ELSE
261 FOR i IN 0 TO 8 LOOP
262 alul.out(i) <= TRANSPORT
'X' AFTER DELAY;
263 END LOOP;
264 END IF;
265 END PROCESS output.processing;
73
6.2 MAP
The MAP accepts an input image from the ALUl, performs the morphological operations,
and outputs the processed image as well as the original image to the ALU2. The architecture
of the MAP is designed to perform two basic morphological operation: dilation and erosion.
The algorithm used in this section is different from the algorithm inserted in section 2.4.
The dilation algorithm in section 2.4 processes an image by sliding a window through the
image. During processing, the central pixel of the window is aligned with a target pixel in
the image; then, the additions between the window and the pixels surrounding the target
pixel are performed; next, the target pixel value is replaced by the maximum value of the
summations; the procedures are repeated through every pixel of the image. The erosion is
similar to the dilation except a negated and rotated window is used and the minimum value
of the summations is chosen as the resultant value.
In the pipelined architecture designed by Jens Rodenberg and Jeff Hanzlik, a pixel
processed by the ALUl proceeds to the MAP, the ALU2 and the Volume Adder. The
algorithm in section 2.4 requires a moving window and a stored complete image which can
not be provided during the pipelined process. In addition, it requires a buffer to store
the original image which has to be pre-loaded before the operation starts. Therefore, it is
necessary to modify the algorithm for a pipelined architecture. The new algorithm will use
a still window and a moving image. It also reduces the size of the buffer, for example, from
512 x 512 to 512 X 7. The modified algorithm is presented in this section.
The architecture has been partitioned into FIFO and MAP. They are separated for two
reasons. First of all, it lowers the development cost because FIFO chips are commercially
available. Secondly, it is more flexible since the column width of the array can be changed
to match the image's column width. The column width of an image could be either 512 or
1024 in the VLSI version, although it is fixed at 512 in the FPGA version.
The FIFO is a shift register array. The size of the array is either 7 rows x 512 columns
or 7 rows x 1024 columns, depending on the image size. The input of the first pixel is
connected with the output of the ALUl. Therefore, each output pixel from the ALUl is
shifted into this shift register array. The shift register array functions as a line delay to
store temporarily 7 rows of the image.
The MAP consists five parts: the image buffer block, the window buffer block, the
74
window register block, the adder block and the comparison tree. The image buffer block
consists of 7 shift register arrays. Each array has 7 registers. The input pixel of each array
is the first pixel of the corresponding row of the FIFO. With this connection, the image is
shifted through the image buffer row by row. The center of the image buffer is the target
pixel of the morphological operations. The window buffer is a 7 x 7 shifter register array.
The Input of the array is connected with the SD bus. A new window value is shifted into
the array upon the rising edge of the w.clk. The contents of the window buffer is latched
by the window registers on the rising edge of the start_proc as the new window for the
morphological operations. The adder block adds the window registers with the image buffer
to produces 49 summations. The summations are compared by the comparison tree to find
out the maximum or minimum value as the result for the target pixel.
For the target pixels located within the first and last three rows as well as the first
and last three columns of an image, part of the surrounding pixels used in the addition
are outside of the image boundary. Therefore, the oo value is used for summation value
to indicate that the pixel value is undefined. The controlling mechanism is provided by
the Master Controller which generates the row and column blanking signals through the
blank counter. Table 6.1 written by Jens Rodenberg shows the blanking sequence. The row
blanking bits 0 to 5 and the column blanking bits 0 to 5 are connected to the adder blocks
in the MAP. The bits 0 to 2 of the row blanking bits indicates the blanking status of row
0 to 2 while the bits 3 to 5 indicates the status of row 4 to 7. The connecting sequence is
the same in the column blanking bits.
A 9-bit signed integer is used to represent a pixel. Values 0 to 255 are used to indicate
the grey scale levels for the pixel. Values -1 to -255 are used for a negated pixel value in
the window for erosion. The -oo is coded as -256 for an undefined pixel value.
The MAP has been through three different implementations. After the architecture
was designed, Jeff and Jens realized that the circuit was much too large to fit into one
FPGA chip. They partitioned the entire circuit into 24 FPGA chips. Each of the chips
contains part of the addition block and the comparison block. This affects the architecture
by extending the pipelined stages from 7 to 14. When Larry Rubin implemented the
architecture in an ASIC, he packed the whole MAP into one ASIC chip. However, the
manufacturing cost for the single chip is quit expensive. In order to control the cost to
a reasonable range, Chris Insalaco and Shishir Ghate partitioned the MAP into 7 ASIC
75
o
o
o
o
o
o
o
o
o
o
o
1 <
o
o
o
o
1 1
o
o
o
i 1
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
r-H
o
o
o
o
o
o
i-H
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
i-H
o
o
o
o
i i
o
o
o
o
o 000000000000
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o 000001000001
o
o
o
o
o
O
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
1 1
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o o
o o
o o
o
o
o
o
o
o
o
o
oo
o
in
o
OS
O
>n
o
n
o
>n
o
m
in
oo
o
n
CO
o
m
o
n
ci
as
o
m
OS
o
in
OS
O
n
o
1 1
m
O
m
11
in
ON
o
m
CD
in in
CN
m
OS
o
n
m
o
n
o
m
1 1
in
m
m in
(N
m
OS
<n
^^
m
o
< 1
in
m
in
m
o
o
o
i i
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
T (
1 I
o
o
o
o
o
o
i i
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
<D
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
'"
o
o o
o o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
1 1
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
CD 1 1
oo
m o
o
'"
"0.
o
Os
o
in
o
1 1
in
o
i-H
in
o
CD
. i I 1
<N
. I
oo
m o
- i/-)
On
o
m
o
<n
i i
' I
n
T <
o
of
1 1
of of
00
co o
"l
CM
ON
o
m
of
o
<n
of
in
of
q
CO CO
co^
Figure 6.1: The Blanking Sequence of the MAP
76
chips. To accommodate various implementations of the MAP and future modifications, a
universal VHDL model with variable output delay is designed. The VHDL models for the
partial circuit of the MAP, (CHIP A, CHIP B, CHIP C and XFIFO) in FPGA version are
included in the appendix for further reference.
6.2.1 Ports
The following list is the port description of the MAP.
elk : is the on-board system clock.
w.clk : is the clock for the window buffer. The new window value is shifted in by this
clock. It provides the ability for the system to load a new set of window values to
window buffers when the current MIP process is still running.
w : is the input for a new window value. It is connected with the SD bus.
xO to x6 : are the row inputs for the image buffer.
rowblnk : is generated by the Master Controller to indicate the row blank status.
colblnk : is generated by the Master Controller to indicate the column blank status.
max : indicates the operation of the MAP. Max is '1' for dilation and '0' for erosion.
start.proc : instructs the MAP to start processing by latching the new window and op
erations on the rising edge of the next clock. Also, the target pixel on the next clock
is the first valid pixel of the new image.
y : is an output port for a pixel from the resultant image.
xo : is an output port for a pixel from the original image.
6.2.2 Processes
Several constants in the architecture are used to specify the pipelined stages.
134 CONSTANT ROW
136 CONSTANT OPTION
137 CONSTANT YDELAY
integer
integer
integer
7;
=7; 7 more delay for jandj
= 5+OPTION;
77
The ROW constant declares the row size of the window. The YDELAY is the pipelined
stages excluding the delay due to the input and output stages. The OPTION is the extra
pipelined stages introduced by splitting the MAP into different chips.
The first eight processes in the model are used to convert the bus value into inte
ger. They are not discussed since the similar code has been shown in the ALUl process,
load_new_const_and_op.
The rowblnk_read_and_convert process converts the 6-bit vector into a 7-bit vector. The
resultant vector is more compatible with the 7 x 7 window.
233 map_7x7_rowblnk_read_and_convert :
234 PROCESS
235 VARIABLE i: integer;
236 BEGIN
237 WAIT ON rowblnk;
238 FOR i IN 0 TO 2 LOOP
239 CASE (rowblnk (i)) IS
240 WHEN >1> =>
241 rowb(i) <= SET;
242 WHEN '0' =>
243 rowb(i) <= RESET;
244 WHEN 'X' | 'Z' =>
245 rowb(i) <= UNKNOWN;
246 END CASE;
247 END LOOP;
248 rowb(3) <= RESET;
249 FOR i IN 3 TO 5 LOOP
250 CASE (rowblnk(i)) IS
251 WHEN ' 1' =>
252 rowb(i+l) <= SET;
253 WHEN '0' =>
254 rowb(i+l) <= RESET;
255 WHEN 'X' | >Z' =>
256 rowb(i+l) <= UNKNOWN;
257 END CASE;
258 END LOOP;
259 END PROCESS map_7x7_rowblnk_read_and_convert;
When the w.clk is active, the window buffer array called wreg is shifted in one word.
Because the wreg is a signal array which can not be re-evaluate in the same delta interval,
a variable array, temp.w, is used for shifting the values. The variable is then copied back
to the original signal array.
288 shift.w.process:
78
289 PROCESS
290 VARIABLE temp.w : window.array;
291 BEGIN
292 WAIT ON neww,w_clk UNTIL w_clk='l';
293 FOR i IN SIZE-1 DOWNTO 1 LOOP
294 temp.w(i) := wreg(i-l);
295 END LOOP;
296 temp.w (0) := neww;
297 wreg <= temp.w;
298 END PROCESS shift.w.process;
The clk.process is synchronized with the system clock. Every clock-related statement
is placed in this process. Since variables are used for signal propagation, it should be
noted that the evaluation result of a variable can be accessed by the latter statement in
the same delta interval. The right sequence of the statements will keep the variable from
being accessed until next clock. For instance, if a variable is the output in statement A
and the input for the statement B, the statement B must be placed before the statement
A. Otherwise, the change of the statement A will be reached incorrectly to the output of
the statement B in the same clock period.
The delayx and delayy are two output buffers used to perform the output delay. The
number of these delay stages are defined by constants XDELAY and YDELAY.
316 clk.process:
317 PROCESS
. . . variable declaration deleted
325 BEGIN
326 WAIT ON startp.clk UNTIL elk = '1';
327 SHIFT X0
328 FOR i IN XDELAY- 1 DOWNTO 1 LOOP
329 delayx(i) := delayx(i-l) ;
330 END LOOP;
331 delayx(O) := xreg((SIZE-l)/2) ;
332 SHIFT Y0
333 FOR i IN YDELAY-1 DOWNTO 1 LOOP
334 delayy(i) := delayy(i-l) ;
335 END LOOP;
336 delayy (0) := newy;
The statements shown next are to find the maximum or minimum value of the summation
according to the value of maxreg, the MIP operation register. The function compare has
been discussed in the previous section.
79
337 COMPARE BLOCK
338 temp.cmp := sum(O) ;
339 FOR i IN SIZE-1 DOWNTO 0 LOOP
340 IF (compare(temp_cmp,sum(i) ,maxreg) = TRUE) THEN
341 temp.cmp := sum(i) ;
342 END IF;
343 END LOOP;
344 newy <= temp.cmp;
The image pixels and the window registers are added in the following statement. The
column and row blanking status are taken care here.
346 ADDER BLOCK
347 FOR i IN ROW-1 DOWNTO 0 LOOP
348 FOR j IN ROW-1 DOWNTO 0 LOOP
349 temp_sum(i*ROW+j) := adder(xreg(i*ROW+j) ,wlreg(i*ROW+j) .
rowb(i) ,colb(j)) ;
350 END LOOP;
351 END LOOP;
352 sum <= temp.sum;
The image pixels stored in the FIFO are then shifted in every clock from the register array
newx whose values are the integers converted from the bit vectors of the ports xO to x6.
These pixels are shifted through the buffer xreg.
354 SHIFT X
355 FOR i IN ROW-1 DOWNTO 0 LOOP
356 FOR j IN ROW-1 DOWNTO 1 LOOP
357 temp_x(i*ROW+j) := xreg(i*ROW+j-l) ;
358 END LOOP;
359 temp_x(i*ROW) :=newx(i);
360 END LOOP;
361 xreg <= temp.x;
If the start.proc is active, the new value of the window registers and the MAP operation
are latched on the rising edge of the next clock.
364 IF (startp = SET) THEN
365 maxreg <= newmax;
366 wlreg <= wreg;
367 END IF;
The integer outputs are converted to type qsim.state_vector for the output ports.
80
368 out_gen(delayx (XDELAY- 1) .xotemp) ;
369 out.gen (delayy (YDELAY-1) .yotemp) ;
370
371 xo <= TRANSPORT xotemp AFTER DELAY;
372 y <= TRANSPORT yotemp AFTER DELAY;
373 END PROCESS clk.process;
The function adder simulates the adders used in the MAP. The additional arguments are
the rb for row blanking and the cb for column blanking status. If either rb or cb is set, the
output is oo.
38 FUNCTION adder (a,b,rb,cb : integer)
39 rb: state of row blanking signal
40 cb: state of column blanking signal
41 RETURN integer IS
42 VARIABLE sum : integer;
43 BEGIN
44 IF ((a = UNKNOWN) OR (b = UNKNOWN)
45 OR (rb = UNKNOWN) OR (cb = UNKNOWN)) THEN
46 sum := UNKNOWN;
47 ELSIF ((a = MFIN) OR (b = MFIN)
48 OR (rb = SET) OR (cb = SET)) THEN
49 sum := MFIN;
50 ELSIF ((a >= 0) AND (b >= 0) AND ((a + b) > MAXNUM)) THEN
51 sum := MAXNUM;
52 ELSIF ((a < 0) AND (b < 0) AND ((a + b) < MINNUM)) THEN
53 sum := MINNUM;
54 ELSE
55 sum := a + b;
56 END IF;
57 RETURN sum;
58 END adder;
6.3 ALU2
The ALU2 has three external inputs: one is the pixel value from the resultant image of the
MAP, one is the pixel value from the original image from the MAP, and one is the pixel
value from an on-board memory bank. Only two of the three inputs are connected to the
internal inputs. The functionality of the ALU2 is similar to the ALUl with two additional
operations: to find the maximum or minimum pixel value of an image. The output of the
ALU2 goes to the Volume Adder and a selected on-board memory.
81
6.3.1 Ports
The following are the input and output ports of the ALU2:
elk: is the on-board system clock.
a: connects to one of the MAP's output port, y, which is the pixel of the resultant image
from the morphological operations.
b: connects to one of the MAP's output port, xo, which is the target pixel of the morpho
logical operations.
c: is the pixel from the on-board memory bank through X2 BUS.
start_alu2: indicates the start of a new image on the next clock period and instructs the
ALU2 to load the new operation and constant.
stop_alu2: indicates the end of an image and instructs the ALU2 to latch the maxi
mum/minimum search result.
sd_in: is the input data bus for operations and constants.
sd_out: is the output data bus for maximum/minimum output registers, 000307 and 000308.
The register 000307 stores the bit 7 to 0 of the maximum/minimum value and the
register 000308 stores the bit 8 of the value.
regsln: is an active low signal of the register-select status for the register 000307.
regshn: is an active low signal of the register-select status for the register 000308.
regenw: is an active high signal for the register write enable.
regenr: is an active high signal for the register read enable.
alu2_out is the pixel output of the ALU2.
6.3.2 Processes
Most of the processes in the ALU2 are identical with that in the ALUl. These processes
have been discussed in the section, ALUl. The processes which are unique to the ALU2 will
be discussed in this section. One of the differences between ALUl and ALU2 is that the
82
data bus is uni-direction for the ALUl and bi-direction for the ALU2. Since the system 1076
version 7.0 does not support INOUT type, the bi-direction bus is handled by the process
sd_handle which connects the sd_in and sd_out inside the process in order to emulate the
INOUT port type.
235 sd.handle:
236 PROCESS
237 BEGIN
238 wait on sd.in, sd.buf;
239 if sd.in 'EVENT then
240 sd.buf <= sd.in;
241 sd.out <= "ZZZZZZZZ";
242 els if sd.buf 'EVENT then
243 sd.out <= sd.buf;
244 end if;
245 END PROCESS sd.handle;
Another difference is that the ALUl has two inputs, but the ALU2 has three inputs.
Evidently, only two of the three inputs of ALU2 can be used as operands. The selection is
controlled by the register xbusmode which maps the value of the bits 2 and 3 of the register
000308. The selected inputs are copied into the internal registers, aff and bff , as operands.
247 load.new.aff.and.bff :
248 PROCESS
249 BEGIN
250 WAIT ON elk, xbusmode UNTIL
clk='l'
251 CASE xbusmode IS
252 WHEN NORM =>
253 aff <= newa;
254 bff <= newb;
255 WHEN CT0A =>
256 aff <= newc;
257 bff <= newb;
258 WHEN CT0B =>
259 aff <= newa;
260 bff <= newc;
261 WHEN CAB =>
262 aff <= newc;
263 bff <= newc;
264 WHEN OTHERS =>
83
265 aff <= UNKNOWN;
266 bff <= UNKNOWN;
267 END CASE;
268 END PROCESS load_new.aff_and.bff;
The ALU2 supports the search capability of the maximum or the minimum pixel in an
image. The following code supports the functionality. For the maximum search, the register
called loopback stores the current maximum value. If the input pixel value is larger than
the value in the loopback, the value in the loopback will be replaced by the input a. In
addition, the a is copied to the output register outff , so the output image is a copy of the
input image. The procedures in searching minimum is identical with that for the maximum.
The register loopback, however, will contain the minimum value instead of the maximum
value of the image.
99 WHEN MIN2AL => IF (compare(a, loopback,MIN) = TRUE) THEN
100 loopback := loopback;
101 ELSE
102 loopback := a;
103 END IF;
104 outff := a;
110 WHEN MAX2AL => IF (compare(a, loopback,MAX) = TRUE) THEN
111 loopback := loopback;
112 ELSE
113 loopback := a;
114 END IF;
115 outff := a;
The search result from the loopback is latched to the register regout when the stop_alu2
signal is active. The regout is the combination of the output register 000307 and 000308.
330 stop_alu2_process:
331 PROCESS
332 BEGIN
333 WAIT ON elk UNTIL
clk='l' AND stop_alu2 = '1';
334 regout <= loopback;
335 END PROCESS stop_alu2_process ;
The output register 000307 or 000308 is read out when the address is selected and the
read enable is active. The output register, regout, stores the maximum /minimum value.
When the address 000307 is selected( regsln='0'), the bit 7 to 0 is read out through sd.out.
84
When the address 000308 is select( regshn='0'), the bit 8 is read out. When the read enable
is inactive, the sd.out value is Z as hi-impedance.
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
375
376
377
378
379
380
381
regenr.process :
PROCESS (regenr)
VARIABLE temp, i, choice : integer;
BEGIN
process the 0-7 bit of regout
IF (regenr = '1') AND (regsln = '0') THEN
temp := regout;
IF (temp < 0) THEN
temp := temp + 512;
END IF;
temp := temp mod 255; regout & Oxff
FOR i IN 0 TO 7 LOOP
choice := temp mod 2;
temp := temp / 2;
CASE (choice) IS
WHEN 1 =>
sd.buf (i) <= TRANSPORT '1' AFTER DELAY;
WHEN 0 =>
sd.buf (i) <= TRANSPORT '0' AFTER DELAY;
WHEN OTHERS =>
NULL;
END CASE;
END LOOP;
process the 8th bit of regout
ELSIF (regenr = '1') AND (regshn = '0') THEN
FOR i IN 1 TO 7 LOOP
sd.buf (i) <= TRANSPORT '0' AFTER DELAY;
END LOOP;
temp := regout;
IF (temp < 0) THEN
temp := temp + 512;
END IF;
IF (((temp / 2**8) mod 2) = 1) THEN
sd.buf (0) <= TRANSPORT '1' AFTER DELAY;
ELSE
sd.buf (0) <= TRANSPORT '0' AFTER DELAY;
END IF;
-- no operation
ELSIF (regenr = '0') THEN
FOR i IN 0 TO 7 LOOP
sd.buf (i) <= TRANSPORT 'Z' AFTER DELAY;
END LOOP;
END IF;
END PROCESS regenr.process;
85
6.4 Volume Adder
The Volume Adder accumulates either the absolute value or the squared value of the output
pixels from the output of the ALU2 and stores the result in registers. The squared values
of all 8-bit integer are calculated ahead and stored in ROM as a look-up table.
6.4.1 Ports
The following fist is the port description of the Volume Adder.
elk : is the on-board system clock.
x : connects with the output of the ALU2.
xs : connects with the squared output of the ALU2.
start_alu2 : indicates that the first valid pixel will arrive after two clock periods. It in
structs the Volume Adder to reset, then starts accumulating the input values.
stop_alu2 : indicates that the last valid pixel will arrive after three clock periods. It
instructs the Volume Adder to stop adding and stores the volume to the output
registers.
regenw : is the register write enable signal. When it is active, and the bit 0 of the regsn
is low, the Volume Adder will latch the new operation value from sd_in.
regenr : is the register read enable signal. When it is active, the contents of the address
selected by regsn is sent to sd.out.
regsn : is the active low signal for address selection. The bit 0 to bit 4 is used to select
the register 000300 to 000304.
sd_in : is the input port from the SD bus.
sd.out : is the output port to the SD bus.
6.4.2 Processes
The regenw.process is to latch the new Volume Adder operation to the buffer mode_l.
86
186 regenw.process:
187 PROCESS
188 BEGIN
189 WAIT ON newsd, regsnreg, regenw UNTIL regenw=' 1' AND regsnreg=REGO;
190 mode.l <= newsd mod 2; newsd k 0x01
191 END PROCESS regenw.process;
The regenr.process is to read out one of the registers, REGO to REG5 whose hexadecimal
address 000300 to 000304, to the SD bus.
193 regenr.process:
194 PROCESS
195 VARIABLE temp.sd : qsim_state_vector(7 DOWNTO 0);
196 BEGIN
197 WAIT ON regsnreg, regenr UNTIL regenr='l';
198 CASE (regsnreg) IS
199 WHEN REGO =>
200 out_gen(outO, temp.sd) ;
201 WHEN REG1 =>
202 out_gen(outl, temp.sd) ;
203 WHEN REG2 =>
204 out_gen(out2, temp.sd) ;
205 WHEN REG3 =>
206 out_gen(out3, temp.sd) ;
207 WHEN REG4 =>
208 out_gen(out4, temp.sd) ;
209 WHEN OTHERS => NULL;
210 END CASE;
211 sd.buf <= TRANSPORT temp.sd AFTER DELAY3;
212 END PROCESS regenr.process;
The values of the start_alu2 and stop_alu2 are buffered into variables start and stop
whenever they change.
213 clk.process:
214 PROCESS
. . . variable declaration deleted
225 BEGIN
226 WAIT ON start_alu2,stop_alu2,clk;
227 IF (start.alu2 = '0') THEN
228 start := RESET;
229 ELSE
230 start := SET;
231 END IF;
232 IF (stop_alu2 = '0') THEN
233 stop := RESET;
87
234 ELSE
235 stop := SET;
236 END IF;
When the variable stop2 equals SET, it indicates that the last valid pixel is done and that
the result is partitioned into 8-bit register format and stored in the output registers, outO
to out4.
237 IF (elk = '1') THEN
238 IF (stop2 = SET) THEN
239 outO <= sum MOD 16#100#; (sum & Oxff)
240 outl <= (sum/2**8) MOD 16#100#; (sum k 0xff00)>>8
241 out2 <= cntl8 MOD 16#100#; (cntl8 k Oxff)
242 out3 <= (cntl8/2**8) MOD 16#100#; (cntl8 k 0xff00)8
243 temp_out4 := ((cntl8/2**17) MOD 2) * 2;--(cntl8 & 0x20000)>>16
244 IF (sign = SET) THEN
245 temp_out4 := temp_out4 + 16#80#; out4 |= 0x80
246 END IF;
247 out4 <= temp_out4;
248 END IF;
When the variable start 1 equals SET, it instructs the Volume Adder reset the internal
registers. Otherwise, the Volume Adder keeps adding the input value every clock. If the
summation is more then 16 bits, the 18-bit counter is used to store the rest bits.
250 IF (startl = SET) THEN
251 sum <= addin;
252 cntl8 <= 0;
253 sign <= RESET;
254 mode.sel <= mode.l;
255 ELSE
256 temp.sum := adder(sum, addin) ;
257 sum <= temp.sum MOD 16#10000#;
258 IF (temp.sum > 16#ffff#) THEN
259 cntl8 <= cntl8+l;
260 END IF;
261 END IF;
During every clock period, the start value is shifted through two flip-flops. It creates a
two clock period delay since it takes two clock period for the ALU2 to generate the first
valid pixel. The stop value goes through three flip-flops to indicate the last valid pixel is
done.
263 startO.temp :=startO;
264 stopl.temp :=stopl;
265 stopO.temp :=stopO;
267 start 1 <= startO.temp;
266 startO <= start;
269 stop2 <= stopl.temp;
270 stopl <= stopO.temp;
271 stopO <= stop;
The mode_sel variable chooses the input from either the squared input, newxs, or the
absolute value from the newx.
273 IF (mode.sel = SET) THEN
274 addin <= newxs;
275 ELSE
276 addin <= newx MOD 16#100#;
277 END IF;
The variable sign is a flag which shows that a negative value is detected from the input.
279 IF (sign /= SET) AND (sig:
280 sign <= SET;
281 END IF;
282 IF ((newx /2**8)= 1) THEN
283 signO <= SET;
284 ELSE
285 signO <= RESET;
286 END IF;
287 END IF;
288 END PROCESS clk.process;
The simulation of all models are performed with Mentor
Graphics' Quicksim. The do file
of the ALUl and ALU2 are provided by Jeff Hanzlik. The results matches the specification.
89
Chapter 7
Memory Units
The on-board memory chip model and the tri-state buffers on the local buses will be de
scribed in this chapter.
7.1 Memory
This memory model is a general read-only memory behavioral model. A more accurate
model should be provided by the manufacturer for a specific commercial chip in a future
timing model of the system.
7.1.1 Ports
D: is the input of new contents.
Q: is the output of the contents.
ADDR: is the selected address.
WEn : is an active-low write enable signal.
CSn : is an active-low chip select.
7.1.2 Process
The chip must be selected for either read or write (CSn='0'). Then, if the WEn='0', the
memory updates the contents on the address selected by ADDR. During the write enable
mode, the output value of the memory is UNKNOWN because the value on a real circuit
90
is unstable during writing in. If the WEn is '1', the chip is read enable. The contents on the
address is output to the port Q.
148 memory.main.process :
149 PROCESS
150 VARIABLE memory: memory.type;
151 BEGIN
152 wait on D.REG, ADDR.REG, WEn, CSn;
153 if (CSn='0') then
154 if ( (ADDR_REG<0) or (ADDR_REG>(RAM_SIZE-1)) ) then
155 assert FALSE
156 report "INVALID MEMORY ADDRESS"
157 severity WARNING;
158 else
159 case (WEn) is
160 when '0' => memory (ADDR.REG) := D.REG; INPUT
161 Q.REG <= transport UNKNOWN after MEMR.DELAY;
162 OUTPUT UNKNOWN
163 when '1' => Q.REG <= transport memory (ADDR.REG)
after MEMR.DELAY;
164 -- OUTPUT
165 when others =>
166 assert FALSE
167 report "INVALID WEn VALUE"
168 severity WARNING;
169 end case;
170 end if;
171 end if;
172 END PROCESS memory .main.process;
173 END memory.behavior;
7.2 Buffer
This is a tri-state buffer. The output of the buffer is controlled by the enable pin on the
buffer: when enable pin is active (low), the buffer is turned on and the input signal passes
through the buffer to the output pin; otherwise, the buffer is turned off and the output of
the buffer is hi-impedance.
7.2.1 Ports
The following list is the port description of the buffer.
BUF.IN: is the input of the buffer.
91
BUF.OUT: is the output of the buffer.
En : En is the active-low buffer enable signal.
7.2.2 Process
The BUF.IN can only pass to the BUF.OUT when the enable is active.
37 BEGIN
38 with En select
39 BUF.OUT <= high.impedence when '1',
40 BUF.IN when '0',
41 buf .unknown when others;
42 END buffer.behavior;
7.2.3 Further Implementation
The table 7.1 is a four state resolve table for the tri-state signals. Since the signal resolving
is not part of system 1076 version 7.0, the actual signal is resolved by a BLM in the same
resolving table.
IN2
I
N
1
z 0 1 X
Z z 0 1 X
0 0 0 X X
1 1 X 1 X
X X X X X
Table 7.1: Tri-state Signal Resolving Table
The following data type and code can be used for a tri-state bus signal in a full-
implemented VHDL system:
TYPE tri.bus IS ARRAY (INTEGER range <>) OF qsim.state.vector;
FUNCTION bus_resolve(unsolved_bus_vectors: tri.bus)
return qsim.state.vector is
VARIABLE temp.vector: qsim.state.vector;
VARIABLE case.vector: qsim_state_vector(l downto 0);
begin
temp.vector := unsolved_bus_vectors(unsolved_bus_vector'LOW) ;
for i in unsolved.bus.vectors' RANGE loop
for j in unsolved.bus.vectors (i) 'RANGE loop
case.vector := temp_vector(j)&unsolved_bus_vector(j) ;
92
case case.vector is
when "ZZ" I "ZO" |"Z1" I "ZX" =>
temp.vector(j) := unsolved.bus.vectors (i) (j) ;
when "00"|"0Z" =>
temp.vector(j) := '0';
when "11"|"1Z" =>
temp.vector(j) := '1';
when others =>
temp.vector(j) := 'X';
end case;
end loop;
end loop;
return temp.vector;
end bus.resolve;
SUBTYPE res.bus.vector is bus.resolve qsim.state.vector;
93
Chapter 8
Conclusion
In this project, the theory of the MIP has been studied, and the behavioral and structural
models of the MIP have been established as well as simulated in VHDL. The behavioral
model can be used for future system development since it incorporates the functionality as
well as the timing information of the MIP from the design specifications.
The structural model, on the other hand, can be used to document the designed MIP
system. A shared BLM testbench has been used for both models, and the simulation
results are identical for a 32 x 32 image. Although the MIP is designed for 512 x 512 image
processing, it is sufficient to simulate the MIP based on a 32 x 32 image since the boundary
conditions between the two are the same. It is clearly an advantage to use a behavioral
VHDL model when simulating the MIP since it is impractical to simulate the MIP at a gate
level.
The MIP system is partitioned into separate functional blocks: I/O unit, Control Units,
Arithmetic Units, and Memory Units. While the behavioral model is based on the func
tionality of the MIP, the structural model is based on the partitioned functional blocks.
Each functional block shown in figure 5.3 is composed of one or more VHDL models cor
responding to their physical blocks. Since the objective of this project was to provide the
documentation for the MIP system, the VHDL models are constructed only for the physical
blocks in the system.
Although the mission of documenting the MIP system has been accomplished with this
project, it is possible for interested readers to further extend the project. One possibility
is to create a VHDL library for logic elements at gate level and use these basic models to
94
construct the structural models of the physical blocks currently represented by the behav
ioral models. Another possibility is to create a new MIP using synthesis tool based on the
current structural model to further decompose the structural models into finer functional
blocks.
95
Bibliography
[1] C. R. Giardina and E. R. Dougherty, Morphological Methods in Image and Signal
Processing, Prentice-Hall, Inc., 1987.
[2] R. M. Haralick and L. G. Shapiro, Computer and Robot Vision Addison-Wesley Pub.
Co., 1992, Ch. 5.
[3] P. Margos and R. W. Schafer, Morphological Systems ofMultidimensionsl Signal Pro
cessing, Proceedings of the IEEE, Vol. 78, No. 4, April 1990, pp. 690-710.
[4] P. Morgos, Tutorial on Advances in Morphological Image Processing and Analysis,
Optical Engineering, Vol. 26, No. 7, July 1987, pp. 623-632.
[5] S. Weber, A View from the Top, ASIC & EDA -Technologies for System Design, June
1992, pp. 14-16.
[6] J. Bhasker, A VHDL Primer, Prentice Hall, 1992.
[7] W. K. Pratt, Digital Image Processing, John Wiley k Sons, Inc., 1991.
[8] J. Hanzlik, private communication.
[9] J. Rodenberg, unfinished M.S. thesis, chapter 4.
[10] J. Rodenberg, unfinished M.S. thesis, chapter 5.
96
Appendix A
Bus Interface
We will discuss the Bus Interface between the host computer and the MIP board in this
chapter. The Bus Interface accepts commands from the host computer and generates the
corresponding register control and read/write signals for the various on-board components.
Before discussing the implementation of the circuit modeling in VHDL, we will describe the
input and output signals of the Bus Interface.
A.l Input and Output Signals
It should be realized that a signal name without the letter "n" at the end indicates that the
signal is active high, while a signal with "n" at the end indicates that the signal is active
low.
RESET: is used to generate a power-on reset to bring the MIP to a known state before its
operation.
BCLK: is an 8-MHz bus clock from the host computer. The clock is used to synchronize
the output signals generated by the Bus Interface.
BALE: updates address information for the Bus Interface when it is high.
SA(15:0): generates register control signals REGSn and register read/write signals through
the IO.Decoder in the Bus Interface.
LA(23:19): are used to provide memory address information about the present bus cycle.
The information is latched on the falling edge of BALE signal.
97
SD(7:0): contains information to generate the control signals.
MEMWn: generates the control signals for the data transfer from the system bus to on
board memory.
MEMRn: generates the control signals for the data transfer from the on-board memory to
the host computer.
IOWn: generates the control signals for the data transfer from the system bus to an I/O
register.
IORn: generates the control signals for the data transfer from an I/O register to the host
computer.
SW(1:0): configures the addresses for I/O registers. Table A.l shows the correspondence
between SW(1:0) and addresses for I/O registers.
SW(1:0) I/O Address
00 000300-00030f
01 000310-00031f
10 000320-00032f
11 000330-00033f
Table A.l: I/O Address
The current design connects both SW1 and SWO to ground.
The output signals are described next.
RESETn: is converted from RESET to provide a power-on reset to bring the MIP to a
known state before its operation.
CS(3:0): selects the corresponding Memory Control chip.
WR-ENn: connects the data path between the PC bus and the input ports of all memories.
RD-ENn: connects the data path between the output ports of all memories and the PC
bus.
PROC-RWn: connects the data path between the input port of selected memory and the
output port ( Y-ont) of ALU2.
98
PC.WEn: generates the write enable signal for a Memory Controller when the data transfer
from the host computer to the specified memory is requested.
REGSn(13:0): are the register control signals used by the Master Controller, the ALUl,
the MAP, the ALU2, and the Volume Adder.
REGENW: updates the register specified by the I/O address.
REGENR: enables the host computer to read the register specified by the I/O address.
REGENRn: is the invert of the output signal REGENR. This signal is not used in the
VHDL model.
W-CLK: generates the write enable signal for load the mask.
MEMCSl6n: indicates the 16-bit data transfer on the present bus cycle.
The VLSI version of the Bus Interface does not exist. Chris Insalaco decided to use the
Bus Interface in FPGA version instead. In the next section, the VHDL model of the circuit
will be discussed.
A.2 VHDL Model of the Bus Interface
The schematics of the Bus Interface is depicted in figure A.l. The VHDL model is based
on this circuit and the BLM model of the old design. There are seven processes in the
model. Processes busJa and bus.sd produces intermediate signals, while processes bus_reset,
bus.bale, busJown, bus_memwn, and pc.wen generate the output signals based on the
intermediate and input signals.
Process busJa essentially models the Memory Decoder in the Bus Interface. The signal
validmem indicates whether the address presented on PC BUS is valid for the current cycle.
The signal balmem is used to activate MEMCS16n.
120 bus.la:
121 PROCESS
122 VARIABLE la.dummy: qsim_state_vector(4 DOWNTO 0);
123 VARIABLE la.bus: integer := UNKNOWN;
124 VARIABLE baseok: integer; -- equivalent to sd(6)
125 BEGIN
126 WAIT ON la, addrctrl, bale;
99
JE
o.(Mtna1 incar-men. . -
cccccccccccccc
tnenmmtntntnmenintntntntn
1 1 lit iii "i. Hi Ml Ml Ml Ml Hi In in I LULU
tnazm.ccle iririetctnorcc cctr
*bbtt)bibbb lOOtX*
fi;
r-cS^a
i O (J
I UJ UJ
u. g si
=* 5C o l
~B O - g
ujj uj a; u 5
aM ce tc (E K
m T a
_
rutniri/iu.
OOOOQDDO
imnminintntntn
I?kkL
LD
^^ In
IIV%aa ' -'II ll^iIIMIblLZ^H^ff- 1-
* 1 "
LU
3
A)
J) c
ID 9
I - 9 0
O 9 a
o C r
0 <4-
<4- <4-
L. 9
-1
* *-
c (4-
". " 0
C <*- ">
a 0 0
3 (T) 3
S X
t
C
( a 3
? a 0
-J a. L u T3
-IE fr*Bx~
Z IE uZ
uj z vtu
U UJ 39
HI u u
cc uj tr
c c -* S
CC X
"
_J
cr. o o j* u
_
rjcn^'inu'f^flDOTtESkjtnsrui
(TCECECrCEtECICrCEtE '
CE trxntninmtntnmmvJtEcccrtxcicc
BO tntntnintntn
Figure A.l: Schematic of BUS Interface
100
127 baseok := (addrctrl/64) MOD 2;
128 la_dummy := la;
129 la.bus := in.gen(la.dummy) ;
130 IF ((addrctrl MOD 64) = la.bus) AND (baseok = 1) THEN
131 validmem <= '1';
132 IF bale = 1 THEN
133 balmem <= '0' ;
134 validmem.not <= '0'; output of U7
135 END IF;
136 ELSE
137 validmem <= '0';
138 balmem <= '1' ;
139 IF bale = >1> THEN
140 validmem.not <= ' 1'; output of U7
141 END IF;
142 END IF;
143 END PROCESS bus.la;
Process bus.sd generates intermediate signals addrctrl and segment, addrctrl contains the
contents of the Address Control Register. Meanwhile, segment contains the contents of
the Segment Control Register to control the memory segments. In addition, this process
generates the output signals CS(3:0), which is determined by the Segment Control Register.
Bit 5 of segment is a flag for mask load or memory access. When the flag is 0, bits 4 and
3 select one of the memory controllers to use the address from the PC BUS for memory
access. When the flag is 1, no memory controller is selected and each memory uses the
address generated by its own controller.
260 bus.sd:
261 PROCESS
VARIABLE DECLARATIONS
267 BEGIN
268 WAIT ON sd, resetn.inter, regenw.inter, regs;
269 sd.val := in.gen(sd);
270 IF resetn.inter =
'0' THEN
271 addrctrl <= 0;
272 segment <= 0;
273 ELSIF regenw.inter'EVENT AND regenw.inter =
'1' THEN
274 IF regs = 16#7fff# THEN
275 addrctrl <= sd.val;
276 ELSIF regs = 16#bfff# THEN
277 segment.dummy := sd.val;
278 segment <= sd.val;
279 END IF;
280 END IF;
101
281 index := (segment_dummy/8) MOD 4;
282 IF (segment_dummy/32 MOD 2) = 0 THEN -- segment (5)
283 CASE index IS
284 WHEN 0 => cs.val := 1
285 WHEN 1 => cs.val := 2;
286 WHEN 2 => cs.val := 4;
287 WHEN 3 => cs.val := 8;
288 WHEN OTHERS => cs.val := 0;
289 END CASE;
290 ELSE
291 cs.val := 0;
292 END IF;
293 out_gen(cs_val, cs.dummy) ;
294 cs <= TRANSPORT cs.dummy AFTER CS.DLY;
295 END PROCESS bus.sd;
296 END bus.interface.behavior;
Process bus_reset generates the output signal RESETn by inverting the input signal reset.
Process bus.bale produces the output signal MEMCS16n .
145 bus.bale:
146 PROCESS
147 BEGIN
148 WAIT ON bale, bclk, balmem;
149 IF balmem = '0' THEN
150 memcsl6n <= TRANSPORT '0' AFTER MEMCS.DLY;
151 ELSE
152 IF bale = '0' AND bclk = '0' THEN
153 memcsl6n <= TRANSPORT '1' AFTER MEMCS.DLY;
154 END IF;
155 END IF;
156 END PROCESS bus.bale;
Process busJow is modeling the component IO.DECODER. The process first decodes the
bits SA3 through SAOto generate register control signals REGSn(15:0) ; it will then activate
REGENR and REGENRn if the host computer issues I/O read. The last output signal
processed is REGENW, which is controlled by a state machine.
102
158 bus.iown:
159 PROCESS
VARIABLE DECLARATIONS
166 BEGIN
167 WAIT ON sa, iorn, iown, resetn.inter, bclk, swO, swl;
168 sa.val := in.gen(sa); read PC SA bus
169 evaluate the new regsn signals
170 index := sa.val MOD 16;
171 regs.val := REGS.VALUES ( index ) ;
172 regs <= regs.val;
173 out.gen (regs.val, regsn.dummy) ;
174 FOR i IN regsn 'RANGE LOOP
175 regsn(i) <= TRANSPORT regsn.dummy (i) AFTER REGSN.DLY;
176 END LOOP;
177 regsn_14 <= regsn.dummy ( 14) ;
178 regsn_15 <= regsn.dummy ( 15) ;
179 change regenr and regenrn accordingly
180 IF (sa_val/64 = 16#c#) AND (swl = sa(5)) AND (swO = sa(4))
181 AND iorn = '0' AND iown = '1' THEN
182 regenr <= TRANSPORT '1' AFTER REGSN.DLY;
183 regenrn <= TRANSPORT '0' AFTER REGSN.DLY;
184 ELSE
185 regenr <= TRANSPORT '0' AFTER REGSN.DLY;
186 regenrn <= TRANSPORT >1> AFTER REGSN.DLY;
187 END IF;
188 -- construct the regenw state machine
189 IF resetn.inter = '0' THEN
190 regw := 0;
191 ELSIF bclk' EVENT AND bclk = '1' THEN edge triggered on bclk
192 IF regw = 0 AND iown = '0' THEN
193 regw := 1;
194 ELSIF regw = 1 THEN
195 regw := 2;
196 ELSIF regw = 2 AND iown = '1' THEN
197 regw := 0;
198 END IF;
199 END IF;
200 -- to generate regenw and regenw.inter
201 IF (sa_val/64 = 16#c#) AND (swl = sa(5)) AND (swO = sa(4))
202 AND regw = 1 AND iorn =
'1' THEN
203 regenw.dummy : =
' 1 ' ;
204 ELSE
205 regenw.dummy := '0';
206 END IF;
207 regenw.inter <= regenw.dummy;
208 regenw <= TRANSPORT regenw.dummy AFTER OUT.DLY;
209 END PROCESS bus.iown;
103
The memory I/O, on the other hand, is handled by the process BUS-MEMWn. It is noticed
that the signal PROC-RWn is generated by inverting the bit 5 of segment register. Since
bit 5 is used as flag to indicate the memory or mask load, we will have to load an image
to the on-board memory before loading the mask. Otherwise, the resultant image will not
be stored in the specified memory. The generation of output signals RD.ENn , WR-ENn ,
and W-CLK are straight forward.
211 bus.memwn:
212 PROCESS
213 BEGIN
214 WAIT ON memrn, memwn, segment, validmem.not, bclk;
215 IF (segment/32 MOD 2) = 1 THEN
216 proc.rwn <= TRANSPORT '0' AFTER OUT.DLY;
217 ELSIF (segment/32 MOD 2) = 0 THEN
218 proc.rwn <= TRANSPORT 'I' AFTER OUT.DLY;
219 END IF;
220 IF memrn = '0' AND memwn =
'1' AND ((segment/32 MOD 2) = 0)
221 AND validmem.not =
'0' THEN
222 rd.enn <= TRANSPORT
'0' AFTER OUT.DLY;
223 ELSE
224 rd.enn <= TRANSPORT
'1' AFTER OUT.DLY;
225 END IF;
226 IF memrn =
'1' AND memwn = '0' AND ((segment/32 MOD 2) = 0)
227 AND validmem.not =
'0' THEN
228 wr.enn <= TRANSPORT
'0' AFTER OUT.DLY;
229 ELSE
230 wr.enn <= TRANSPORT 'V AFTER OUT.DLY;
231 END IF;
232 IF memrn =
' 1' AND memwn = '0' AND ((segment/32 MOD 2) = 1)
233 AND validmem.not =
'0' THEN
234 w.clk <= TRANSPORT
'1* AFTER OUT.DLY;
235 ELSE
236 w.clk <= TRANSPORT
'0' AFTER OUT.DLY;
237 END IF;
238 construct the pc.wen state machine
239 IF
bclk' EVENT AND bclk = '0' THEN negative edge triggered
IF memrn = '1' AND memwn =
'0' AND ((segment/32 MOD 2) = 0)
AND validmem.not =
'0' AND wrenn = 0 THEN
wrenn <= 1;
243 ELSE
244 wrenn
<= 0;
245 END IF;
246 END IF;
247 END PROCESS bus.memwn;
104
240
241
242
The state machine in the above process is used to generate the signal PC-WEn, which is
simply the inversion of WRENn .
It should be realized that the current design of the Bus Interface is only for 512 x 512
images. Since bit 0 in the Address Control Register is compared with the address bit 19 on
the PC bus, the base address has to be reconfigured in software when the address bit 19 is
changed for an image of 1024 x 1024.
105
Appendix B
Controller
In the previous chapter, the interface between the host computer and the MIP board was
described. We will explore in this chapter the control mechanism of the MIP. There are two
types of controllers in the MIP system: the Master Controller and the Memory Controller.
The Master Controller is responsible for synchronizing operations between the ALUl, the
MAP, the ALU2, and the Volume Adder. The Memory Controller is in charge of assigning
memory for a source or a destination image. There are four Memory Controllers, each
controls its own memory unit. The following section is devoted to the Master Controller.
B.l Master Controller
We have mentioned earlier that the Master Controller is used to synchronize operations
between the different image processing units, e.g, the ALUl, the MAP, the ALU2, and the
Volume Adder. Therefore, the outputs of the Master Controller are connected with all of
the image processing units as well as the Memory Controller. Most of the inputs are from
the bus interface. The inter-connections between the different circuit component models
can be found in figure 5.3. We will discuss in this section the inputs and outputs of the
Master Controller, the differences between the FPGA version and VLSI version in circuit
design, and the implementation of the circuit modeling in VHDL.
B.l.l Inputs and Outputs
Most of the input and output descriptions are taken from materials provided by Jens
Rodenberg. [10] We will first describe the inputs. Again, it should be realized that a signal
106
name without the letter "n" at the end indicates that the signal is active high, while a signal
with
"n"
at the end indicates that the signal is active low.
REGSSTARTn: selects the start register when REGENW is also active. The selected
register will start the processor.
REGS-PIn: selects the processor's instruction/status register when either REGENW or
REGENR is also active. The register contains two write-only instruction bits (bits 0
and 1) and two read-only status bits (bits 6 and 7). Bit 0 determines the operation
of MAP being either erosion (bit 0 = 0) or dilation(bit 0 = 1); bit 1 is the bus mode
selection which assigns the X2 bus to be either the input of ALUl (when bit 1 = 0) or
the input of ALU2 (when bit 1 = 1); bit 6 is raised high when the processor is ready
to accept the next instruction; bit 7 is high when the processor finishes the processing.
REGS-MSn: selects the memory select register when REGENW is also active. The register
provides the information on the memory locations of source image and destination
image for an image processing. The Master Controller uses this information to route
memory control signals to the appropriate Memory Controller, to select the appro
priate bus to connect the ALUl, the MAP and the ALU2 with their corresponding
memories, and to allow the host computer to read an image from any of the on-board
memories. The details were described in Table 4.6 in chapter 4
REGENW: enables the host computer to write to one of the three registers described above.
REGENR: enables the host computer to read from the processor's instruction/status reg
ister described above.
SD(7:0): is the 8-bit data bus from register read or register write.
SIZE: selects the image size to be either 512 x 512 (when SIZE = 0) or 1024 X 1024 (when
SIZE= 1).
PL-START: starts the pipelined processor.
SO-2: adjusts the number of delays which is determined by the MAP operation.
CLK: is the on-board system clock.
107
CLRn: clears all flip-flops upon a power up of the system.
The output signals are described next.
PLSTART-NEXT: generates the start signal for the next pipelined processor.
START-MEM(3:0): issues start signals to the corresponding Memory Controllers specified
by the active bits. The Memory Controller will start its address counter from zero
upon the next rising clock edge after receiving the start signal.
WRITE-MEM(3:0) instructs the corresponding Memory Controller specified by the active
bit that its associated memory will be written to, starting when the START-MEM
signal is issued. This will cause the selected memory controller to issue the write
enable signal to its memory during valid memory address.
Xl-BUSSELn: connects the memory specified by the active bit to the XI bus by enabling
the buffer between them.
X2-BUSSELn: connects the memory specified by the active bit to the X2 bus by enabling
the buffer between them.
INIT-ALU1: instructs the ALUl to load its next instruction upon the next rising clock
edge. INIT-ALU1 also initializes the ALUl that the first valid pixel of an image is
its operand upon the next rising clock edge..
MAX: specifies that the operation of the MAP being either erosion or dilation.
START-PROC: starts the MAP operation upon the next rising clock edge. The MAP uses
this signal to latch the window values and the morphological operation for the next
image being processed.
ROWBLNK(5:0): informs the MAP which rows are to be blanked.
COLBLNK(5:0): informs the MAP which columns are to be blanked.
INIT-ALU2: instructs the ALU2 and the Volume Adder to load their next instructions
upon the next rising clock edge and resets the ALU2 and the Volume Adder.
STOP-ALU2: informs the ALU2 and the Volume Adder that their operands upon the next
rising clock edge will no longer be a valid pixel for the current operation.
108
The input and output signals described above are based on the VLSI version. The FPGA
version does not include input signals SIZE, PL.START, and S0.2; it does not generate
output signal PL-START-NEXT. The detailed differences between the two versions are
described in the next subsection.
B.1.2 VLSI Version vs. FPGA Version
There are two differences in terms of the functionality of the Master Controller between
the two versions. One is that the VLSI version is capable of generating the control signals
for either 512 x 512 images or 1024 x 1024 images, while the FPGA version is fixed for
512 x 512 images. The actual size of the image in the VLSI version is determined by the
jumper connecting the input SIZE to either VCC or ground. When SIZE is connected to
ground, the image is 512 x 512; when SIZE is connected to VCC, the image is 1024 x 1024.
The other difference is that the VLSI version is designed to adopt to pipelined operations:
it accepts PL-START to start the processor and generates PL-START-NEXT for the next
processor.
Other differences between the two versions in circuit design exist as well. Although the
differences do not affect the functionality of the Master Controller, they do affect imple
mentation of the VHDL model. The first difference occurs at the start register which is a D
flip-flop. In the FPGA version, the start signal at the output of the start register stays high
for only one clock period regardless of the duration of its input signal. In the VLSI version,
however, the start signal at the output of the register stays high as long as the input is
high. The second difference is in the number of the delay stages, which is determined by
the MAP. In the FPGA version, it requires 15 clock cycles (or stages) for the target pixel
to go through the MAP and enter the ALU2. In the VLSI version, it requires only 8 stages.
The input SO-2 in the VLSI version is used to adjust the number of delay stages. The third
difference is related to the Start Blank and Blank Counter blocks and will be discussed in
VHDL modeling of that circuit in the next subsection. The schematics of the VLSI and
FPGA versions are shown in figure B.l and B.2.
B.l.3 VHDL Model of the Master Controller
The VHDL model for the Master Controller is based on on the BLM model of the FPGA
version and extended to adopt the new features in the VLSI version. The Master Controller
109
Figure B.l: Schematic of Master Controller in the VLSI version
110
c c c c
UJ UJ UJ
- rj n id - (\i <n
") "\ "i ") ) "i -i -i
tnincntntniocnv)
51 *}^IiIiIi,^*i3 ^h^-hULJUlil
(n ccoctrar> -i i
L'. ':' -'- ~ ' ^ (MS t\l ^^^^(CCECC
$ H H 85 5 $ $!$ 5 5 Ss5 5 5 5i5
kJ
3
U U (J
OOO (M
- (C 3
u.
- >
c ^
g 8 c
i .
o -a
- c
O B
O i c
I, ^ *
-* (_ %. -I
CO o
O n m ^
X 3
-* 3 * 0
Figure B.2: Schematic of Master Controller in the FPGA version
111
in either version can be categorized into three stages according to the
signals'flow: the
input stage, the clocked stage, and the output stage. The first stage is the input stage. It
accepts the input signals from the input ports and from the clocked stage, then generates
the output signals to the other two stages as well as to the output ports. The second stage is
the clocked stage. It accepts the signals from the input stage and input ports, and generates
the output signals for the input stage, output stage, and output ports. The input CLK is
only used in the second stage to synchronize the signals. The third stage is the output
stage, which accepts signal from the other two stages as well as from the input ports, and
generates the signals for the output ports. The inter-connections between the stages are
shown in figure B.3. In the VHDL model, only these interconnections are defined as signals
to connect the different processes. This is to minimize the use of the signals.
Each of the stages is modeled with one or more processes. We will discuss each process
according to the stages classified above.
Input Stage
The input stage consists of four processes: sd_handler, switch, ctrLreg, and gen_start.
Sd_handler is used to handle the bidirectional data bus. Since the INOUT port type was
not implemented in system 1076 version 7.0, a buffer called sd.buffer is used to mimic the
bidirectional bus.
149 sd.handler:
150 PROCESS
151 BEGIN
152 WAIT ON sd.in, sd.buffer;
153 IF sd.in 'EVENT THEN
154 sd.buffer <= sd.in;
155 sd.out <= "ZZZZZZZZ";
156 ELSIF sd.buffer 'EVENT THEN
157 sd.out <= sd.buffer;
158 END IF;
159 END PROCESS sd.handler;
The process ctrl_reg is to generate the signals bus-mode, max, and memJnstr. Bus-mode
and max have been explained in section B.l.l. MemJnstr configures START-MEM (3:0),
WRITE.MEM(3:0), XLBUSSELn(3:0), and X2.BUSSELn(3:0). The details for config
uring memories and buses were described in table 4.6.
112
"o
o
O
3
CO
CO
CD
CO
C/)
CD
O
O
Ql
CO
CD
113
200 ctrl.reg:
201 PROCESS
VARIABLE declarations.
207 BEGIN
208 WAIT ON regenw, regs.pin, regs.msn, sd.buffer,
209 clrn, start;
210 IF clrn = '0' THEN
211 bus .mode<= '0' ;
212 ELSE
213 IF regenw = '1' AND regs.pin = '0' THEN
214 max <= TRANSPORT sd.buffer(O) AFTER DELAY.MAX;
215 IF start = '1' THEN
216 bus .mode<= sd.buffer (1) ;
217 END IF;
218 END IF;
219 IF (regenw = '1' AND regs.msn = '0') THEN
220 mem.instr <= in.gen (sd.buffer) ;
221 END IF;
222 END IF;
223 END PROCESS ctrl.reg;
The process switch is separated from the process ctrl_reg since the input signals are
fixed for any MIP board. Simulation would be inefficient if the switch and ctrLreg were
combined into one process. The input signals for the process are SIZE and SO-2, which were
described in section B.l.l. The output signals are sb-max, bcjmax, and s0-2-val. Sb-max
and bc-max are the maximum numbers that Start Blank Counter and Blank Counter will
reach respectively.
165 switch:
166 PR0CESS(size,sO_2)
167 VARIABLE s0_2_val_dummy : integer;
168 BEGIN
169 CASE size IS
170 WHEN '0' => sb.max <= 32*3+4+2-1; should be 512*3+5
171 bc.max <= 31; should be 511
172 WHEN '1' => sb.max <= 64*3+4+2-1; should be 1024*3+5
173 bc.max <= 63; should be 1023
174 WHEN OTHERS =>
179 END CASE;
180 s0_2_val_dummy := in_gen(s0_2) ;
181 s0_2_val <= s0_2_val_dummy;
182 IF s0_2_val /= UNKNOWN THEN
183 del.length.l <= 6+s0_2_val_dummy;
184 ELSE
114
188 END IF;
189 END PROCESS switch;
It should be mentioned that the numbers used in the process for sb-max and bc.max
given in the listing on 114 are configured to limit the simulation time. The actual numbers
for the MIP are in the comment lines. Since both counters start from 0, the corresponding
maximum is one less than the size of the counter. The general formula for the blank count
maximum is
bcmax = row size 1. (B.l)
Blank Counter generates the row and column blanking signals for the MAP. The done signal
from the Blank Counter indicates to the MAP that the last pixel is its operand.
The general formula for the start blank maximum, however, is different from the VLSI
version and the FPGA version. In VLSI version,
N -I
. N + 1 , ,
sbjmax = - x row size -\ (-2-1. (B.2)
N is the window size fixed at 7 for a 7x 7 window. Since Start Blank is to synchronize the
operations between the ALUl and the MAP, the number of clock periods between the rising
edge of INITJlLUI and rising edge of STARTJPROC should be ^xrow size+^ Two
extra clock cycles are needed to account for the delay in the operation of ALUl. Referring
to figure B.l, it is noticed that INIT-ALUl is generated one clock cycle after the start signal
of the Start Blank Counter. In addition, one more clock cycle is needed in the ALUl to
perform either addition or comparison before the target pixel is sent to the MAP.
In FPGA version,
N - 1 N + 1
sb_max = - X row size -\ + 2-1-1. (B.3)
The difference of one clock cycle between the two versions is due to different designs for
the blank counter. Combinational logic is used in the VLSI version, while a state machine is
used in the FPGA version. The result is that the row blank signals and column blank signals
in the FPGA version are activated one clock after the start signal of the Blank Counter is
activated. On the other hand, the blank signals in the VLSI version are activated in the
same clock cycle as the start signal of Blank Counter is activated. Therefore, a delay is
115
inserted after the Blank Counter's start signal in the FPGA version to generate starLproc,
which should be synchronized with the first row/column blank signals. To account for the
inserted delay, the counter in Start Blank is one less than the required number.
The process gen.start is to generate the start signal for the MIP operation. It accepts the
signals start-buf from the clocked stage and from the input port PL-START, and generates
the output signal start.
192 gen.start:
193 PROCESS
194 BEGIN
195 WAIT ON pl.start, start.buf ;
196 start <= start.buf OR pl.start;
197 END PROCESS gen.start;
Clocked Stage
There is only one process, clocked.blocks, in this stage. This process is the core of the
Master Controller since it deals with all the synchronized signals for the ALUl, the MAP,
the ALU2 and the Volume Adder operations. It generates pro-rdy, pro-done, xl-en, x2-en,
start-xl, start-x2, and star-y for the output stage; it generates signals for the output ports
INIT-ALU1, INIT-ALU2, STARTJ3R0C, STOP-PROC, ROWBLNKs, COLBLNKs, and
PLSTART-NEXT; it also generates signal start-bufwhich in turn generates the start signal
for the MIP. Since the output ports have been described in section 7.1.1, we will explain
here only the output signals which are not connected with the output ports.
Pro-rdy: is the input signal of the output stage to indicate that the processor is ready to
accept the next instruction. The signal is the bit 6 of the instruction/status register.
Pro-done: is the input signal of the output stage to indicate that the processor finishes the
processing. The signal is the bit 7 of the instruction/status register.
Xl-en: enables the first latch in BUS-SELECT to generate Xl-BUSSELn.
X2-en: enables the second latch in BUS-SELECT to generate X2-BUS.SELn.
Start-xl, start-x2, starLy: are the input signals of the output stage to generate output ports
START-MEM(3:0) and WRITE-MEM(3:0).
116
Start-buf: is the output of the start register. It is the input signal of the process gen_start
in input stage.
Before the code for process is presented, we will briefly describe the related part of the
circuit in figure B.l to understand the basic functions being modeled. Upon receiving the
start signal, START-BLANK starts to count from 0. When sb.max is reached, it generates
START-PROC. Meanwhile, the same signal is entered into BLANK.COUNTER to initialize
the counter. As we described in section 7.1.1, STARTJROC indicates that the target pixel
in the MAP is the first pixel in an image. The various delay stages are required for the
MAP to perform the operation. After the pixel is processed, it enters the ALU2. This
event is signaled by the INIT-ALU2. STOP-ALU2\s active at bcmax + l clock cycles after
the INIT-ALU2 being active. In addition, start-xl is the same signal as start, but start-x2
depends on the bus mode, which was described in section 7.1.1.
The information within the process is passed by using variables. Therefore, it is ex
tremely important to order the statements in the right sequence so that the correct prop
agation of the information is guaranteed. The rule of the thumb is to assign the value of
a variable to the last clock stage first, and the first clock stage last. Since the code is well
documented, we will not explain the code in more details.
226 clocked..blocks:
227 PROCESS
... VARIABLE declarations.
240 BEGIN
241 WAIT ON elk, clrn;
242 IF (:1m = '0' THEN
... Initialization of the registers.
261 ELSIF elk 'EVENT AND elk =
'1' THEN
262 To generate pro.rdy and pro.done signals
263 IF start =
'1' THEN
264 pro.rdy <=
'0'
;
265 pro.done <=
'0'
;
266 ELSE
267 IF inter.init.alu2 =
'1' THEN
268 pro.rdy <=
'1'
;
269 END IF;
270 IF inter.stop.alu2 =
'1' THEN
271 pro.done <=
' 1' ;
272 END IF;
273 END IF;
117
274 To generate xl.en, x2_en, and start.y output signals.
275 dummy.start := ((NOT regs.startn) AND regenw) OR pl.start;
276 start.xl <= dummy.start;
277 xl.en <= start;
278 x2_en <= start_x2;
279 start.y <= inter_init_alu2;
280
281 To propagate the internal signals of the blocks through
282 the variable assignment. The order in assigning the
value is extremely important: the assignment should
start from the last stage in the chain.
inter.pl.start.next := inter.init.alu2;
= start;
= dell.last;
= del2_last;
283
284
285
286 inter.init.alul
287 inter.init.alu2
288 inter.stop.alu2
289 FOR i IN MAX.DEL.l DOWNTO 1 LOOP
290 dell(i) := dell(i-l);
291 del2(i) := del2(i-l);
292 END loop;
293 dell(O) := sb.done;
294 del2(0) := bc.done;
295 CASE s0_2_val IS
296 WHEN 0 => dell.last := dell(6);
297 del2_last := del2(6);
310 WHEN 7 => dell.last := dell(13);
311 del2_last := del2(13);
312 WHEN OTHERS => null;
313 END CASE;
314
315 start_x2 is level sensitive to its inputs. Therefore,
316 the assignment is placed after the assignment of its
317 input variables.
318 IF bus.mode = '1' THEN
319 start_x2 <= dell.last;
320 ELSE
321 start_x2 <= dummy.start;
322 END IF;
323
324 Process the blank counter next
325 IF sb.done = '1' THEN
326 rowb.cnt.dummy := 0;
327 colb.cnt.dummy := 0;
328 bc.done :=
'0'
;
329 ELSIF rowb.cnt.dummy = bc.max AND colb.cnt.dummy = bc.max THEN
330 rowb.cnt.dummy := 0;
331 colb.cnt.dummy := 0;
332 ELSIF colb.cnt.dummy = bc.max THEN
118
333 colb.cnt.dummy := 0;
334 rowb.cnt.dummy := rowb.cnt dummy+1;
335 ELSE
336 colb.cnt.dummy := colb_cnt_dummy+l;
337 END IF;
338 IF rowb.cnt.dummy = bc.max AND colb.cnt.dummy = bc.max THEN
339 bc.done := '1';
340 ELSE
341 bc.done := '0';
342 END IF;
343 To process the start blank counter. This part has to be
344 placed after blank counter since the variable sb.done
345 used in blank counter is'1 generated here.
346 IF start = '1' THEN
347 sb.cnt := 0;
348 sb.done := '0';
349 ELSIF sb.done = '1' THEN
350 sb.done := '0' ;
351 ELSIF sb.cnt = sb.max- 1 THEN
352 sb.done := '1';
353 sb.cnt := sb_cnt+l;
354 ELSE
355 sb.cnt := sb_cnt+l;
356 END IF;
357
358 To process the regs.startn signal
359 IF regenw = '1' AND regs.startn = '0' THEN
360 start.buf <= '1' ;
361 ELSE
362 start.buf <= '0' ;
363 END IF;
364 END IF;
365
366 To generate the row and column blanks for their output ports,
367 IF rowb.cnt.dummy = 0 THEN
368 out_gen(16#38#,rowblnk_temp) ;
369 ELSIF rowb.cnt.dummy = 1 THEN
370 out_gen(16#30#,rowblnk_temp) ;
371 ELSIF rowb.cnt.dummy = 2 THEN
372 out_gen(16#20#,rowblnk_temp) ;
373 ELSIF rowb.cnt.dummy = (bc_max-2) THEN
374 out_gen(16#01#,rowblnk_temp) ;
375 ELSIF rowb.cnt.dummy = (bc_max-l) THEN
376 out_gen(16#03#,rowblnk_temp) ;
377 ELSIF rowb.cnt.dummy = bc.max THEN
378 out_gen(16#07#,rowblnk_temp) ;
379 ELSE
119
380 out_gen(16#0#,rowblnk_temp);
381 END IF;
... The generation of column blank is identical as above.
398 Output the signals to the corresponding output ports.
399 start.proc <= TRANSPORT sb.done AFTER DELAY.OTHERS;
400 init.alul <= TRANSPORT inter.init.alul AFTER DELAY.OTHERS;
401 init_alu2 <= TRANSPORT inter.init.alu2 AFTER DELAY.OTHERS;
402 stop_alu2 <= TRANSPORT inter.stop.alu2 AFTER DELAY.OTHERS;
403 pl.start.next <= TRANSPORT inter.pl_start.next AFTER DELAY.OTHERS;
404 rowblnk <= TRANSPORT rowblnk.temp AFTER DELAY.OTHERS;
405 colblnk <= TRANSPORT colblnk.temp AFTER DELAY.OTHERS;
406 END PROCESS clocked.blocks;
Output Stage
There are two processes in this stage. The process Mem_bus_reg is to model BUS.SELECT
and MEM-SELECT in figure B.l. Xl-BUSSELn(3:0) and X2-BUS.SELn(3:0) are gen
erated by BUS.SELECT; START.MEM(3:0) and WRITE-MEM(3:0) are generated by
MEM-SELECT. The following code is to model the functionality of BUS-SELECT and
MEM.SELECT. The decoding mechanism on memJnstr (memory instruction) is based
on the integer manipulation rather than the logic manipulation. The functionality to be
described is better understood with an integer presentation.
409 mem.bus.reg:
410 PROCESS
. . . VARIABLE declarations
20 BEGIN
421 WAIT ON mem.instr, start.xl, start_x2, start.y,
422 xl.en, x2_en, clrn;
423 To generate the value for mem.sel.buffer
424 IF clrn = '0' THEN
425 mem.sel.buffer := 0;
426 ELSIF start.xl =
'1' THEN
427 mem.sel.buffer := mem.instr;
428 END IF;
429
430 To generate the value for writing memory
431 w.mem.val := 0;
432 FOR i IN 0 TO MEM.SEL.INDX LOOP
433 IF extract_bits(mem_sel_buffer,2*i+l,2*i) = 2 THEN
434 w.mem.val := w_mem_val+2**i;
435 END IF;
120
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
END LOOP;
out.gen (w.mem.val .temp.write.mem) ;
FOR
To generate the value for starting memory
i IN 0 TO MEM.SEL.INDX LOOP
CASE extract_bits(mem_sel_buffer,2*i+l,2*i) IS
WHEN 0 => temp
WHEN 1 => temp
WHEN 2 => temp
WHEN OTHERS =>
END CASE;
IF temp = '0' THEN
temp.value := 0;
ELSIF temp = '1' THEN
temp.value := 1;
END IF;
:= start.xl;
:= start_x2;
:= start.y;
temp := '0';
CASE i IS
WHEN 0
WHEN 1
=> start.mem.val:= temp.value;
=> start.mem.val:= 2*temp_value
+start.mem.val;
WHEN 2 => start.mem.val:= 4*temp_value
+start.mem.val;
8*temp_value
+start.mem.val;
WHEN 3 => start.mem.val
WHEN OTHERS => null;
END CASE;
END LOOP;
out_gen(start_mem_val,temp_start_mem) ;
To generate the value for selecting xl bus
IF (extract_bits(mem_instr,7,4) = 0) THEN
CASE extract.bits (mem.instr,3,0) IS
WHEN 0 => xl.buf.val := 16#0f#
WHEN 1 => xl.buf.val := 16#0e#
WHEN 2 => xl.buf.val := 16#0d#
WHEN 4 => xl.buf.val := 16#0b#
WHEN 8 => xl.buf.val := 16#07#
WHEN OTHERS => xl.buf.val := 16#0f#;
ASSERT (FALSE)
REPORT "Bus conflict in hardware! No bus selected."
SEVERITY ERROR;
END CASE;
IF clrn = '0' THEN
xl.buf.val := 16#0f#;
END IF;
ELSIF clrn = '0' THEN
xl.buf.val := 16#0f#;
ELSIF xl.en = '1' THEN
121
483 xl.buf.val := 0;
484 FOR i in 0 TO MEM.SEL.INDX LOOP
485 IF (extract_bits(mem_instr,2*i+l,2*i) /= 0) THEN
486 xl.buf.val := xl.buf
_val+2**i;
487 END IF;
488 END LOOP;
489 END IF;
490 out_gen(xl_buf_val, xl.bus.dummy) ;
491 To generate the value for selecting x2 bus
492 IF clrn = '0' THEN
493 x2_buf_val := 16#0f#;
494 ELSIF x2_en = '1' THEN
495 x2_buf_val := 0;
496 FOR i in 0 TO MEM.SEL.INDX LOOP
497 IF (extract_bits(mem_instr,2*i+l,2*i) /= 1) THEN
498 x2_buf_val := x2_buf
_val+2**i;
499 END IF;
500 END LOOP;
501 END IF;
502 out_gen(x2_buf_val, x2_bus_dummy) ;
503 Output the signals to the corresponding ports
504 write.mem <= TRANSPORT temp.write.mem AFTER DELAY.WRT_STR.MEM;
505 start_mem<= TRANSPORT temp.start.mem AFTER DELAY.WRT_STR.MEM ;
506 xl.bus.seln <= TRANSPORT xl.bus.dummy AFTER DELAY.X.BUS;
507 x2_bus_seln <= TRANSPORT x2_bus_dummy AFTER DELAY.X.BUS;
508 END PROCESS mem.bus.reg;
The reading of instruction/status register is modeled by statusj-eg, which is fisted below.
In the process, the variable regrst is used as a switch between status register and sd.buffer.
The switch is turned on whenever the register is addressed, and the switch is turned off
whenever the register is de-addressed. If the switch is at off position, the value in sd.buffer
will not be affected by the instruction/status register. The implementation of the switch
is based on the function of read enable for instruction/status register in actual hardware.
It should be realized that the sd.buffer is driven by two sources in the same architecture:
one from status_reg and one from sd_handler. The code is written in the way that only one
source is turned on at any time. If the code needs to be updated to system 1076 version
8.0, a resolution function should be written for this multi-source driven signal. In system
1076 version 7.0, however, the resolution function is not implemented, and the value on the
multi-driven signal is overwritten by the newest signal value.
122
B.2 Memory Controller
A Memory Controller is to assign its corresponding memory to a source image or a destina
tion image by providing the address and the write enable signals for the memory. We will
describe in the fofiowing subsections the inputs and outputs of the Memory Controller and
the corresponding VHDL model.
B.2.1 Inputs and Outputs
The number of inputs for the Memory Controller is small. CLK'is the on-board system clock,
while CLK2 is a delayed signal of CLK and is used to generate proper memory write enable
pulses. Size indicates the size of the image. STARTJMEM(3:0) and WRITE-MEM(3:0) are
the outputs of the Master Controller and have been described in section 7.1.1. CLRn reset
the memory counter and write enable register. PC-CS and PC-RD are used to generated
output signals and are described below.
There are only two outputs from each Memory Controller: MEM-ADDR(19:0) and
WEn. MEM.ADDR(19:0) is the address fine for the corresponding memory, and WEn is
the active-low write enable signal. The origin of these signals could be from either the host
computer or the Memory Controller, depending on the status oiPC-CS. When PC-CS is low,
MEM-ADDR(19:0) and WEn are generated by the Memory Controller itself; when PC-CS
is high, MEM-ADDR(19:0) is connected with PC.ADDR(19:0), and WEn is connected with
PC-RD. Therefore, the host computer is granted the access to one of the memories if the
corresponding PC-CS is high.
B.2. 2 VHDL Model of the Memory Controller
The schematic of a Memory Controller chip is shown in figure B.4.
From figure B.4, it is seen that the Memory Controller is mainly consisted of a memory
counter, three multiplexors, and a register. The memory counter is started from 0 when
STARTJV1EM is high, or CLRn is low, or the number reaches the highest address. The
function is accomplished by the process mem.counter.
44 mem_counter :
45 PROCESS
46 VARIABLE count : integer := 0;
47 BEGIN
124
o
^^_
7-
1 ^ a:
_ _ur:
1 1 B*b _>-iIII Ir^"ll^0-2
IIbiIfi-
* i -">
^^^^^
oz
LU
CM
LT)
e
o D
.c
o v 00
CO 3
v. CO
a o
. r
a
0 o
- .c
- t- 0
C *v
o a L
o, X1 _ _ D
C X o
o a
r^
3
/
<n C
r ai 2
> 0 O
u QJ r L.
"^ Li. L XI
CE
a
a
cr
i
co
CJ
a.
Figure B.4: Schematic of Memory Controller
125
48 WAIT ON elk UNTIL elk = '1';
49 IF (start_mem = '1' OR clrn = '0' OR count = maxent) THEN
50 count := 0;
51 done <= '0' ;
52 ELSIF start_mem = '0' THEN
53 count := count+1;
54 IF count = maxent THEN
55 done <= ' 1 ' ;
56 END IF;
57 END IF;
58 addr <= count;
59 END PROCESS mem.counter;
The status of the PC-CS determines the source of the corresponding memory address
and the write enable signal. This is implemented by the processes source_of_mem_addr and
generate.wen respectively.
89 source_of_mem_addr:
90 PROCESS
91 VARIABLE temp.addr : integer;
92 VARIABLE mem.delay: time := 0 ns;
93 BEGIN
94 WAIT ON pc_cs, addr, new_pc_addr;
95 IF pc.es = '0' THEN
96 temp.addr := addr;
97 mem.delay := 25 ns;
98 ELSIF pc.es = '1' THEN
99 temp.addr := new.pc.addr;
100 mem.delay := 40 ns;
101 ELSE
102 temp.addr := UNKNOWN; -- This result is not specified in BLM
103 END IF;
104 new.mem.addr <= TRANSPORT temp.addr AFTER mem.delay;
105 END PROCESS source.of .mem.addr;
Addr is the signal generated by the process mem.counter, new.pc.addr is the address
from the host computer, and new-mem-addr is the address for the next memory read/write
cycle.
155 generate.wen:
156 PROCESS
157 BEGIN
158 WAIT ON clk2, pc.es, we, pc.rd;
159 IF pc.es = '0' THEN
126
160 IF clk2='l' AND we='l' THEN
161 wen <= TRANSPORT '0' AFTER 30 ns;
162 ELSE
163 wen <= TRANSPORT '1' AFTER 30 ns;
164 END IF;
165 ELSIF pc.cs = '1' THEN
166 wen <= TRANSPORT pc.rd AFTER 50 ns;
167 END IF;
168 END PROCESS generate.wen;
The signal we is the output of the write enable register and is generated by the process
generate.we.
137 generate.we:
138 PROCESS
139 BEGIN
140 WAIT ON elk UNTIL elk = '1';
141 IF clrn = '0' THEN
142 we <= '0' ;
143 ELSIF clrn = '1' THEN
144 IF start_mem= '0' THEN
145 IF done = '1' THEN
146 we <= '0' ;
147 END IF;
148 ELSIF start_mem= '1' THEN
149 we <= write.mem;
150 END IF;
151 END IF;
152 END PROCESS generate.we;
127
Appendix C
Utilities
The following utilities were used during this project. The general procedures of creating an
ASCII format image file were:
Scan a photo or graph by the Xerox 7650 Scanner. Output of the scanner is an image
file in TIFF format.
Convert the TIFF format into IMG format. The IMG format can be used for frame-
grabber and the MIP board.
Convert the IMG format into ASCII format. The VHDL model reads the ASCII
format.
Convert the ASCII format into PostScript format. The PostScript format can be used
for HP LaserJet Printer.
C.l XEROX 7650 Scanner
The scanner is connected with the PC,
"Beaker." Do not turn on the scanner until you
start the scanner program. Type
"scan7650"
after the DOS prompt to start the scanner
application. Choose the TIFF file format which (may be) is the only working format.
C.2 TIFF to IMG
Use the program, TIFTOIMG under the directory CONVERT on BEAKER.
128
C.3 IMG to ASCII / ASCII to IMG and Display an IMG
Image
Use the program, IMAGEIO under the directory WEICHUN on Gateway 2000 486/25c.
C.4 ASCII to PS
Use the program, PS under the directory WEICHUN on Gateway 2000 486/25c. For the
512/time512 images, copy the c program to APOLLO workstations and recompile the
program.
C.5 Display a PostScript Image on PC
Use the program, GS, under the directory GS. GS is the GostScript which is much more
powerful then display a PS file on screen. Read the documentation.
C.6 Connect PC with Apollo Workstations - DPCI
On the Gateway 2000 486/25c, type "start." After the prompt shown, type in your
Apollo username and password to login. The D drive is your Apollo account. The lpt2: is
connected to the printer lj for ASCII file. The lpt3: is connected to the printer server fj for
POSTSCRIPT file. Read the DPCI menu before use it. Before you leave PC, type "stop"
to disconnect the link.
C.7 Print out a PostScript on LaserJet on Apollo
Use the following commands: prf -pr f j -trans FILENAME
129
