Analysis and hardware implementation of color map inversion algorithms by Martin, Michael
Rochester Institute of Technology
RIT Scholar Works
Theses Thesis/Dissertation Collections
8-1-2002
Analysis and hardware implementation of color
map inversion algorithms
Michael Martin
Follow this and additional works at: http://scholarworks.rit.edu/theses
This Thesis is brought to you for free and open access by the Thesis/Dissertation Collections at RIT Scholar Works. It has been accepted for inclusion
in Theses by an authorized administrator of RIT Scholar Works. For more information, please contact ritscholarworks@rit.edu.
Recommended Citation
Martin, Michael, "Analysis and hardware implementation of color map inversion algorithms" (2002). Thesis. Rochester Institute of
Technology. Accessed from
Analysis and Hardware Implementation of
Color Map Inversion Algorithms
by
Michael W. Martin
A Thesis Submitted in Partial Fulfillment
of the Requirements for the Degree of
MASTER OF SCIENCE
Computer Engineering
Approved By:
Principal Advisor
Kenneth Hsu, Professor, Computer Engineering
Committee Member
-------------------
Soheil Dianat, Professor, Electrical Engineering
Committee Member _
Athimoottil Mathew, Professor, Electrical Engineering
Department Head
Andreas Savakis, Associate Professor and Department Head, Computer Engineering
Department of Computer Engineering
College of Engineering
Rochester Institute of Technology
Rochester, New York
August 2002
REPRODUCTION PERMISSION STATEMENT
PERMISSION DENIED
Analysis and Hardware Implementation of
Color Map Inversion Algorithms
I, Michael W. Martin, hereby deny permission to any individual or organization to
reproduce this thesis in whole or in part.
Michael W. Martin
Date
Abstract
The purpose of this thesis is to investigate several algorithms that are used to compute the
inverse of a forward printer map. The forward printer map models the printer by
mapping points in the printer's input color space to points in the printer's output color
space. The inverse of this forward map is required to convert input color specifications in
a device-independent color space to a color in the printer's device-dependent color space
before being presented to the print engine. The accuracy of the inverse printer map
directly affects the accuracy of the reproduced colors. Therefore, any measured change
in the forward printer map requires re-computation of the inverse map if accurate and
consistent color reproduction is to be maintained. An efficient and accurate method of
computing the inverse map could be used in an automatic color correction system.
Three algorithms for computing the inverse of the forward printer map are studied in this
thesis project. These are the Shepard's, Moving Matrix, and Iteratively Clustered
Interpolation (ICI) algorithms. The algorithms are implemented in C and simulated in
order to benchmark their relative accuracy, speed, and complexity. The simulations show
the ICI algorithm to be the fastest and most accurate at computing the inverse map, and
its complexity does not far exceed that of the other algorithms. The ICI algorithm was
implemented in VHDL and synthesized to a Synopsys generic library in order to
determine the approximate size and speed of an ASIC that could perform the inverse
computation. The final implementation resulted in two modules: one that implements the
ICI algorithm, and one that implements the trilinear interpolation function that is used by
the ICI algorithm. The synthesized ICI module contained 112,683 cells, and the
synthesized trilinear interpolation module contained 190,357 cells. The timing of the
modules resulted in a 40 nanosecond clock period, which corresponds to a maximum
operating frequency of 25 MHz. These synthesized results show that this algorithm is
suitable for an ASIC that could be used in a real-time automatic color correction system.
Table ofContents
1 INTRODUCTION
2 BACKGROUND INFORMATION
2.1 TheXerographic Printing Process 3
2.2 Color Spaces 4
2.2.1 RGB Color Space 4
2.2.2 CMY Color Space 4
2.2.3 LAB Color Space 6
2.3 GamutMapping 7
3 DEVICE MODELING 10
3.1 Forward PrinterMap 11
3.2 Inverse PrinterMap 12
4 THEORY 13
4.1 Algorithms 16
4.1.1 Shepard'sAlgorithm 17
4.1.2 MovingMatrix Algorithm 1 8
4.1.3 Iteratively Clustered Interpolation Algorithm 1 9
4.1.4 Trilinear Interpolation 23
5 SOFTWARE IMPLEMENTATION 25
5.1 AlgorithmMetrics 25
5.1.1 DeterminingAlgorithm Complexity 25
5.1.2 DeterminingAlgorithm Execution Speed 26
5.1.3 DeterminingAlgorithm Accuracy 26
5.2 OptimalParameters 27
5.2.1 Shepard'sAlgorithm 28
5.2.2 MovingMatrixAlgorithm 28
5.2.3 ICIAlgorithm 28
5.3 SimulationResults 30
6 ICI HARDWARE IMPLEMENTATION 32
6.1 DesignMethodology 32
6.1.1 System Partitioning andArchitecture 33
i
6.1.2 Datastorage 33
6.1.2.1 Operation 34
6.1.2.2 Memory Data Layout 35
6.1.3 DataRepresentation 36
6.2 SynthesisMethodology 37
6.3 Simulation & Testing 39
6.4 HardwareModules 40
6.4.1 Trilinear Interpolation Module (tritop.vhd) 40
6.4.1.1 Operation 42
6.4.1.2 Sub-blocks 43
6.4.1.3 Summary ofResults 50
6.4.2 ICIModule (icitop.vhd) 51
6.4.2.1 Operation 55
6.4.2.2 Input Description 55
6.4.2.3 Sub-blocks 56
6.4.3 Summary ofResults 63
6.5 SystemOverview 63
7 CONCLUDING REMARKS 65
7.1 Conclusions 65
7.2 FutureWork 66
7.3 Acknowledgements 66
8 REFERENCES 68
9 APPENDICES 69
9.1 Software Simulation accuracyResults 69
9.1.1 Shepard ' s Interpolation 70
9.1.2 MovingMatrix 71
9.1.3 ICI 71
9.2 VHDL Code 73
9.2.1 TRIJTOP.VHD 73
9.2.2 TRI_CTRL.VHD 77
9.2.3 TRICXBLOCK.VHD 84
9.2.4 TRI_TERM_BLOCK.VHD 85
9.2.5 TRI_ADDRJBLOCK.VHD 86
9.2.6 TRI_TB_TOP.VHD 87
9.2.7 ICITOP.VHD 89
9.2.8 ICI_CTRL.VHD 93
9.2.9 ICIJDP.VHD 103
9.2.10 DIS_DP.VHD 104
9.2.11 ADDR_CALC.VHD 105
9.2.12 TABLEMEM.VHD 106
11
List of Figures
Figure 1: The Xerographic Printing Process 3
Figure 2: Axes of the CIELAB color space 6
Figure 3: GamutMapping for Constant Lightness and Hue 10
Figure 4: System of P"1 and P 14
Figure 5: Sub-cube with Lattice Points Determined During Extraction 23
Figure 6: ID Linear Interpolation 24
Figure 7: MemoryModule Diagram 34
Figure 8: Top Level Diagram ofTrilinear InterpolationModule 40
Figure 9: Submodule Diagram ofTrilinear InterpolationModule 41
Figure 10: Top Level Diagram ofTrilinear Interpolation TRICTRL Block 44
Figure 11: Top Level Diagram ofTrilinear Interpolation TRIADDRBLOCK 45
Figure 12: Datapath Architecture ofTRIADDRBLOCK 46
Figure 13: Top Level Diagram ofTrilinear Interpolation TRICXBLOCK 47
Figure 14: Datapath Architecture ofTRI_CX_BLOCK 47
Figure 15: Top Level Diagram ofTrilinear Interpolation TRI_TERM_BLOCK 48
Figure 16: Datapath Architecture ofTRITERMBLOCK 49
Figure 17: Top Level Diagram of ICI Module 51
Figure 18: Submodule Diagram of ICIModule 52
Figure 19: Top Level Diagram of ICI_CTRL Block 57
Figure 20: Top Level Diagram of ICI ADDRCALC Block 58
Figure 21 : Datapath Architecture of ICI ADDRCALC Block 58
Figure 22: Top Level Diagram of ICI DIS_DP Block 59
Figure 23: Datapath Architecture of ICIDISDP Block 60
Figure 24: Top Level Digram of ICIDP Block 61
Figure 25: Datapath Architecture of ICIJDP Block 62
Figure 26: Diagram of System to Compute
P"
64
Figure 27: Diagram of System to Interpolate Image Data 65
m
List ofTables
Table 1: Optimal Algorithm Parameters 29
Table 2: Algorithm Accuracy 30
Table 3: Algorithm Execution Time 30
Table 4: Algorithm Complexity 30
Table 5: Trilinear Interpolation Module I/O Signals 42
Table 6: Trilinear Interpolation Module Synthesis Results 50
Table 7: ICIModule I/O Signals 53
Table 8: ICI Module Synthesis Results 63
iv
1 Introduction
One of the most crucial challenges facing today's printer manufacturers is to design
printers that accurately and consistently reproduce the colors in images [1]. This is a
difficult task because the transformation from the printer's input color space to its output
color space is non-linear and drifts over time. Color reproduction is also affected by the
physical properties of the media, including temperature, moisture content, brightness, and
weight. These variables are constantly changing and can lead to inaccurate and
inconsistent color reproduction. When this occurs, the printer must be re-calibrated. The
calibration process is typically manual and time-consuming and reduces the productivity
of the printer.
A color correction system is needed to account for these variations in order to accurately
and consistently reproduce the colors in printed images. Solutions that can automate the
color correction process will lead to very productive printers with little "downtime".
There is an abundance of research activity aimed at developing efficient and effective
color calibration and control systems [6, 8, 10]. The forward and inverse printermaps are
central to many of these systems.
A forward printer map is a practical and accurate model of a printer. This forward map
takes the form of a multidimensional look-up table and is constructed by performing
input-output color experiments on an actual printer. The inverse of this forward map,
called the inverse printer map, is needed to convert an input color specification in a
device-independent color space to a color in the printer's device-dependent color space
1
before being presented to the print engine. If the inverse printermap is accurate, the print
engine will produce a color reproduction that closely matches the original device-
independent color specification. Any measured change in the forward printer map
requires re-computation of the inverse map if color accuracy and consistency is to be
maintained. A method for accurately and efficiently computing the inverse of the
forward map is sought.
Section 1 of this paper is this brief introduction. In Section 2, several important topics
relating to this project will be discussed. Section 3 describes how printers are modeled.
Section 4 presents the problem at hand and the theory behind the problem. The inversion
algorithms are also discussed in this chapter. Section 5 describes the software
implementation and results. Section 6 describes the hardware implementation and
results. Finally, Section 7 concludes the paper with a discussion of conclusions, future
work, and acknowledgements. References and Appendices can be found at the end of the
paper. The CD accompanying this paper includes the paper itself as well as data files,
source C and VHDL code, netlists, and area and timing reports of the synthesis results.
2 Background Information
2. 1 TheXerographic Printing Process
Imaging systems consist of a network consisting primarily of video display monitors,
scanners, and printers. Each of these devices have their own device-specific color space;
generally RGB for monitors and scanners and CMY for printers. These devices
communicate image data with each other using a psycho-physical based and device-
independent color space, such as LAB.
A specification in the device-independent color space is transformed into a device-
dependent color space using amultidimensional color correction table or inverse color
map. This inverse map is based on the forward transformation characteristics of the
printer. Once converted into the device-dependent color space, the color-separated image
is converted to halftones and delivered to the print engine that is responsible for the
physical production of the print [8]. This process is illustrated in Figure 1.
Color
Specification
LAB
Printer
Pre-processor
(Inverse Color Map) CMY
Print Engine
(Forward ColorMap)
Color
-? Reproduction
LAB
Figure 1: The Xerographic Printing Process
2.2 Color Spaces
Color is specified in a three dimensional space where each point in the space describes a
single color. There are several such color spaces, such as RGB, CMY, and LAB. This
project focuses on printers; therefore, this research work was done using the CMY and
LAB color spaces described in this section. The RGB color space is also described in
order to further explain and contrast the additive and subtractive nature of the color
systems. Although not used directly in the printing process, RGB is the color system
used on the displays that show the images prior to printing and to which the color
reproduction is often compared.
2.2.1 RGB Color Space
In the RGB space, a color consists of the primary components red, green, and blue. The
RGB system is additive, meaning that varying amounts of the primaries are added
together to produce a given color. The additive nature of the RGB system makes it the
color system of choice for devices that are self-illuminating light sources, such as
displays and scanners. This color space is known as a device-dependent color space
because the same RGB color specificationwill vary perceptually on different devices.
2.2.2 CMY Color Space
Another device-dependent color space is the CMY space. A color in this space consists
of the primary components cyan, magenta, and yellow. This system is subtractive,
meaning that varying amounts of the primaries are used together to absorb certain colors;
those colors that are reflected, rather than absorbed, combine to form the perceived color.
Color prints are illuminated by an external light source, and the perceived colors are due
to the combination of reflected wavelengths. Thus, a printer's input color space is
typically described in CMY.
Printers typically add a fourth ink - black - which results in four-color CMYK. The
black ink is added for several reasons. To produce black using only the CMY primaries,
100 percent of each ink must be used. This is a disadvantage because color inks are more
expensive and because using 100 percent of the three primaries results in a very wet print
that is susceptible to wrinkles and takes a long time to dry. Furthermore, the combination
of full CMY usually results in more of a muddy brown than black color. Also, because
the overlay of inks may not be exact, black portions of a print may have edges that do not
have full CMY coverage. If these reasons are not enough, including black in the color
mix can increase the contrast of colors thereby increasing a printer's color gamut.
Replacing some percentage of the three primaries is achieved by gray component
replacement and under color removal [7]. A color can be specified using only these
primaries; the black component is added later by a separate and independent processing
step. In this project, we need only consider the three primaries cyan, magenta, and
yellow.
2.2.3 LAB Color Space
The LAB (CIELab) color system was developed by the Commission Internationale de
l'Eclairage (CIE) in 1976 [7]. There is a distribution ofwavelengths for any given light
that is called the spectral power distribution. This system is based on the spectral power
distribution of colors. The CIE defined three color-matching functions for humans that
are based on the human psycho-physical perception of color. The integral of the spectral
power distribution is weighted by the three color-matching functions to provide what is
known as the tristimulus values of the color. The LAB system defines a three
dimensional color space with tristimulus values L, a, and b. The L component is a
measure of lightness of a color. This axis varies from 0 (black) to 100 (reference white).
The a component is a chromatic measure of red-green, and the b component is a
chromatic measure of blue-yellow. The maximum of a, specified only by +a, is only red
with no green, while the minimum of a, specified only by -a, is only green with no red.
Similarly, the maximum of b, specified only by +b, is only yellow with no blue, and the
minimum of b, specified only by -b, is only blue with no yellow. The axes of this color
space are shown in Figure 2.
Figure 2: Axes of the CIELAB color space.
The LAB color space is device-independent; color specification using this system are
representative only of the color's spectral power distribution and is in no way related to
any particular device. Colors with identical spectral power distribution are perceived to
be exactly the same color, provided they are seen in the same visual environments,
regardless of the manner in which the color is being displayed. This color space is the
industry standard for describing colors in a manner that is independent of any device.
2.3 GamutMapping
Printers are incapable of reproducing every single color in a given color space. The set or
range of colors that a printer can reproduce is known as the printer's color gamut. An
input color specification can be located either inside or outside of this gamut. A printer
can immediately reproduce a color that is inside the gamut. Colors located outside of the
gamut are not reproducible by the printer and must be mapped to a color in the gamut.
This section describes the gamut mapping process that was used in this project [4].
Gamut mapping was necessary after constructing the structured input of the inverse
printer map (see Section 4). The process consists of two steps: first determining whether
an input color specification (input point) is located inside or outside the gamut, then
mapping colors that are outside of the gamut to colors inside the gamut.
The procedure to determine whether an input point is located inside or outside of the
printer gamut begins by upsampling the forward printer map P. This
133 data set was
7
upsampled to
643
using trilinear interpolation. The upsampling creates a continuous 3D
solid with a smooth boundary in the CMY and corresponding LAB color spaces. The
boundary points of the CMY structure are assumed to correspond to the boundary points
of the LAB structure. This is a practical and acceptable assumption for most xerographic
printers.
The input point and the boundary points of the LAB structure are shifted with respect to
the centroid of the LAB structure. This centroid is the average of all of the points in the
gamut, approximately L=50, a=0, and b=0 for most xerographic printers. These points
are then converted to spherical coordinates r, a, and 9. The conversion is given by the
following equations:
r = J{L-LE)2+{a-aE)2+{b-bEy
a = arctan
f(b-bE)^
K{a-aE)j
9 = arctan
f \
(L-LE)
J(a-aE)2+(b-bE) 2 J
In the above equations, [LE aE bE] specifies the centroid of the LAB structure, r is the
distance from the centroid to the input point, a is the hue angle with range 360. 9 is the
angle in a plane of constant awith range 180.
The set ofboundary points is searched for those points that fall into the range aAa and
9 A9 of the input point. For this project,
Aa= A9= 5. Geometrically, this forms a
cone around the input point. The average distance r is calculated for all boundary points
that fall within this cone. If the distance parameter of the input point is greater than this
average, then the input point is considered to be outside of the gamut. Conversely, if the
input point's distance is less than the average, it is considered to be inside the gamut.
A point outside the gamut should be mapped to a point in the gamut such that the original
and mapped points are as perceptually close as possible. Research has shown that the
perceptual difference between two colors is lowest when the lightness and hue of the
colors are equal [5, 7]. Therefore, the mapping approach used for this project aimed to
preserve the lightness and hue of the original point. The boundary of the LAB structure
was searched to find the point that best satisfied the mapping criteria described above and
given by the follow equations:
L = L'
a'
= arctan = arctan
[a]
= a
[L a b] specifies the original input point and [L' a' b'] specifies the mapped point. The
mapping is illustrated in Figure 3.
b
Note: a-b plane for constant L shown
(L=L')
Figure 3: Gamut Mapping for Constant Lightness and Hue
3 Device Modeling
Device color calibration requires a model of the device. These models are generally
either theoretical or empirical. The advantage of theoretical models is that color
prediction can be performed with relatively few actual measurements. However, these
analytical models are usually not very accurate because they do not adequately capture
the nonlinearities of real systems due in large part to external variables. Examples of
such external variables are temperature, humidity and the weight and type of printable
media. These variables have a significant affect on the colors that are reproduced. The
Neugebauer equations are a classic example of color prediction using a theoretical model
[5].
10
Polynomial regression is one type of empirical model. This model suffers from accuracy
problems; prediction of colors close to the sample points used for the regression provide
acceptable accuracy, but for the rest of the points in the gamut there is no guarantee [9].
The most accurate and practical, and therefore the industry standard, empirical model is a
look-up table containing input-output pairs that characterize the device. This model is
discussed in detail in the following section.
3. 1 Forward PrinterMap
A color printer can be viewed as a device that maps colors from an input color space to
an output color space. The printer can be modeled by a look-up table containing points in
the input color space and their corresponding mapping to points in the output color space.
The full model can be realized by interpolation for input points not contained in the look
up table. The look-up table is an approximation to the actual printer function; the larger
the look-up table, the better the approximation. This look-up table characterizes the
printer function and is referred to the forward printer map. The forward printer map is
denoted P.
The forward printer map is constructed experimentally by printing colors in the device's
input color space on paper and measuring these color patches in the output color space
using a spectrophotometer. The disadvantage of this model is that many color
experiments must be performed in order to construct a forward map that is an accurate
representation of the device. Performing these color experiments is time-consuming,
11
especially because this process currently non-automated. For example, to cover the entire
gamut of a typical printer, a 10x10x10 entry table resulting in 6,000 parameters is needed
for acceptablemodeling accuracy [6, 8].
The input of the forward printer map can be structured by sampling the printer's input
space in equal steps along each axis of the source domain. These sampled points become
the "node", "grid", or "lattice" points of the forward map. For n levels of division, this
gives
(n-1)3
cubes and
n3 lattice points. The n3 lattice points of the source space are
printed, and these color patches are measured to determine the output color space
specification. The corresponding values from the source and destination spaces populate
the look-up table. Note that the output of P will be unstructured (or non-uniform)
because the transformation from input color space to output color space is typically
highly nonlinear.
3.2 Inverse PrinterMap
The inverse printer map, denoted
F1', is constructed from the forward printer map P. The
inverse map associates points in the output color space with points in the device's input
color space. Simply swapping the data from P is not desirable for two reasons. First, the
inverse look-up table will not be well defined for colors near the gamut boundary because
it is possible that more than one input point be mapped to the same output point. Second,
it results in a look-up table with unstructured input. Interpolation with data that has
unstructured input is complex and time-consuming. On the other hand, if the input is
structured, very fast and efficient geometric interpolationmethods can be utilized.
12
Geometrical linear interpolation is preferred over other non-linear interpolation
techniques because it is more efficient and accurate. There are several geometrical
interpolations, including trilinear, prism, pyramid, and tetrahedral [5]. Each of these vary
in terms of accuracy, extraction (or search) complexity, and interpolation (or
computation) complexity. In this thesis project, trilinear interpolation was utilized for
look-up table interpolation (see Section 4.1.4).
Instead of simply swapping the data from P, algorithms are required that interpolate
irregularly sampled multi-dimensional data from P. These algorithms must be capable of
efficiently computing the inverse map so that it has structured input points and so that it
is as best an approximation to the true inverse as possible. Several of these algorithms
are discussed in the next section.
4 Theory
Consider the forward printer map P that maps a point v in the printer's input color space
to a point z in the output color space. Also consider the inverse printer map
P1
that maps
a point x in the target color space to a point y in the printer's input color space. Given a
point x in the target color space, the inverse color map is used to find a point v in the
printer's input color space that the printer will map to a point z in the output color space.
This system is depicted in Figure 4.
13
-? z
Figure 4: System ofP1 and P
The desired result is for the printer to print a color z that, when measured, will be as close
as possible to the requested color x. The difference between the target and output colors
can be quantified by computing the Euclidean distance between them. This difference is
known as AE(P(y),x) or simply AE and is given by Equation 1 .
AE = \\z-x\\ = \\P{y)-x (1)
Note that
z=P(y)
In general, and specifically for this project, the target and output color space is the
device-independent LAB color space. For the target color (LAB)jn and the output color
(LAB)out, Equation 1 becomes:
A = J{Loul-LJ2+{Aoul- Amf + {Boul -Bj
14
The smaller the value ofAE for a given input, the better the inverse map. Humans do not
perceive color differences less than 1.0. AE = 1.0 is know as the "just noticeable color
difference." This leads to the following optimization problem:
Given a target color x and the forward printer map P, find the printer input y that
solves
minAE{P{y),x) (2)
y
Solving this optimization problem results in the inverse printer map
P'1
. The structured
inverse printer look-up table is constructed according to the methodology outlined in [1],
re-stated below:
1 . Obtain the estimated forward printer map, P, of a given color printer. This is
achieved by equally sampling the printer input space Y and then the printed
color Zj for each grid node y is obtained from experiments on the actual
printer. In addition, some interpolation technique is used to estimate the
outputs corresponding to input points that are not grid nodes.
2. Grid the target color space XXo obtain an ordered collection of vectors Xj, i =
1, 2, ..., h (each x{ corresponds to a grid node). This will result in grid nodes
that are outside of the printer gamut; these points must be first mapped onto
the gamut using the approach described in Section 2.3.
3. Obtain the yf that solves Equation 2 for Xj and P, for i = 1, 2, ..., h.
15
4. Select a mutidimensional interpolation technique to obtain the inverse map
outputs y corresponding to input points x that are not exactly grid nodes.
Several algorithms have been proposed for solving the optimization problem given by
Equation 2. These algorithms can vary widely in terms of accuracy, numerical stability,
speed, and complexity. Three of these algorithms are Shepard's Interpolation [3],
Moving Matrix [2], and Iteratively Clustered Interpolation [1]. These algorithms were
the subject of study in this thesis. The details of each algorithm are discussed in the
following sections. Trilinear interpolation is used frequently by the ICI algorithm to
interpolate through P and F1; it is discussed at the end of this section.
4.1 Algorithms
The algorithms will be presented in an abstract sense, where x represents colors in the
target color space, y represents colors in the printer's input color space, and z represents
colors in the printer's output color space.
For the algorithms discussed in this section, recall that he forward printer map P contains
N number of entries and defines the mapping function:
Zj=P(yj) for 0<j<N
16
For the forward printer map used in this project, the printer's input color space was
CMY, and its output color space was LAB. The algorithms can be applied to this specific
application by remembering that x, y, and z are equivalent to the color space vectors:
x = [LAB]
y = [CMYj
z=[LAB]
4.1.1 Shepard's Algorithm
Shepard's algorithm is based on the work of Donald Shepard and is a well-established
method for interpolating scattered data [3]. It is essentially an application of weighted
averaging. The value for a given input point is calculated by a weighted average of all
other data points, where the weighting is a function of the distance from the given point
to the other data points.
Given a color x in the target color space, Shepard's algorithm computes the inverse j?by
the following equation:
y
ifdj ^ 0 for all z}
y . if dj = 0 for some Zj
where dj is the Lp norm between the input point x and z/.
dj=hj-x\
17
There are two variable parameters in this algorithm: p and ju. The p parameter specifies
how the distances between points are calculated. The /u parameter affects the locality of
the weighting function; large values of /u result in more local behavior, which means that
only those points closest to the input point will be significant.
4.1.2 Moving Matrix Algorithm
The Moving Matrix algorithm computes the printer inverse using linear weighted least-
squares regression [2]. The inverse is given by the following equation, where A is the
transformation matrix.
y =
xAT (3)
The transformation matrix A is found by minimizing the weighted squared error given by
the following equation:
E=z;=I^h-^ir (4)
Differentiating Equation 4 with respect to A, setting it equal to zero, and solving for A
yields the closed form solution:
A Ut^I
where
.=IJ'>jk4 md
The output;? can then be calculated according to Equation 3.
18
The weighting is a function of the distances from the input point x to all of the other
points Zj contained in the forward printer map:
1
df+e
Here dj is the Euclidean distance from the input point x to the point Zj.
The variable parameters of this algorithm are /u and s. These parameters affect the
locality of the regression. Large values of /u and small values of s give W) more local
behavior, which means that only those points closest to the input point will be significant.
4.1.3 Iteratively Clustered Interpolation Algorithm
The Iteratively Clustered Interpolation (ICI) algorithm is a gradient-based optimization
method that uses an iterative technique to generate initial points [1].
An unconstrained gradient-based optimization algorithm to solve Equation 2 would be
given by [11]:
y{k + \) = y{k)-f3
rdAE{y,x)2^
dy
k > 0, for a given y(0) (5)
Note that AE(y,x) is Equation 1, restated below:
A = [z-x| = |P(v)-x|
19
Thus,
A(y,x)2= \\P{y) - xf = (P{y) - xj (P{y) - x)
The gradient ofAE(y,x)2 with respect to y for a fixed x is then:
dAE{y,x)2
dy
= 2J{y)'{P{y)-x) (6)
Substituting Equation 6 into Equation 5 gives the following update equation:
y{k + \) = y{k)-vJk<{P{y{k))-x) (7)
where /u=2fi. P{y{k)) -x is a component vector of the differences between the target
color x and the output color z produced by interpolating y(k) through the forward printer
map P:
P{y{k))-X =
Zk,0 XQ
Zk,l -X,
Jk.2 -x2_
Jk is the Jacobian of (P(y(k)) - x) evaluated at v = y(k), and is given by:
Jk =
dz1 dz2 dz3
dyx
dzx
dy,
dz2
dy,
dz3
dy2
dz]
dy2
dz2
dy2
dz3
dy3 dy, dy,_
20
This matrix is also known as the gradient matrix. It is calculated using numerical
differentiation of the forward printer map P. dz is an average of the forward and
backward finite differences that result from separately varying each component of v and
interpolating through P.
Gradient-based optimization methods suffer from drawback that a solution may be
optimal only in a local sense, rather than being the global optimum. In order to avoid this
situation, the initial estimation of the inverse must be close to the actual globally optimal
solution. A novel and efficient procedure to determine a "good" initial estimate is
presented in [1] and is re-stated as follows:
1 . Search the z,- points ofP for the one that is closest to the given target point x. Call
this point zaux.
2. Find the corresponding point yaux. This is the point such that zaux
= P(yauJ- Yaux
is a node point in the grid of the Y space, which is a point in the input space of the
forward printer map P. This yaux is a course estimate for the inverse ofx.
3. Select M points in the neighborhood ofyaux; generate a cluster ofM points (yauxl,
yaux2, , yauxM) by moving along axes around yaux. Map these M points to obtain
Zauxj
= P(yauxj), j = 1, 2, ..., M.
4. Find the closest of the zauxj points to the input x and call this point z0. Take the
initial estimate y(0)=y0 to be the point such that z0
= P(yo)-
21
After using this procedure to find the initial estimate P(0), the update equation given by
Equation 7 can be used. Define parameters kmax and s that are the stop criteria of the
iteration of the update equation.
s specifies the error threshold at which the update iterations can stop, such that
|P(y(*))-x|<* (8)
kmax is the maximum number of iterations that algorithm will perform when computing
the inverse for a given point. It is useful to define this parameter as a stop criteria
because not every input point will result in an error that is below the threshold s. When
either Equation 8 is satisfied or k > kmax, the algorithm stops and y(k) is taken as the
solution.
The parameter 8 for this algorithm specifies the perturbation used in the numerical
differentiationmethod used to compute the gradient matrix.
The parameter ju should be selected in order to achieve fast convergence and to meet
accuracy requirements. This parameter should be bounded by the following equation, as
discussed in [1]:
2
0< ju<
trace{J0J0')
22
4.1.4 Trilinear Interpolation
Trilinear interpolation of a look-up table is a three-dimensional geometric method for
computing the output values of input points that are not contained in the look-up tables
[5]. For the sake of this discussion, the look-up table is either the forward or inverse
printer map, and this interpolation technique is used with these maps for input points that
are not grid points.
Trilinear interpolation consists of two steps: the extraction step and the interpolation step.
In the extraction step, the sub-cube in the source color space that contains the input point
is determined by a series of comparisons. The eight vertices of this cube are the lattice
points in the source space, as shown in Figure 5.
Pon(xo,yi, zO
Pooi(xo!yo, z0
Poio(xo,Vi, ZoLfc
Pooo(xOryo, Zo)
Pm(xi,yi,Zi)
y<>, zO
Pno(xi,yi, zo)
Pioo(xi>yo, Zq)
Figure 5: Sub-cube with Lattice Points Determined During Extraction
The interpolation step consists of the repeated use of one-dimensional linear
interpolation. The following discussion is a brief review of ID linear interpolation.
23
Referring to Figure 6, a point p on the curve between lattice points po and pi is to be
interpolated.
xO x
Figure 6: ID Linear Interpolation
The interpolated value, pc(x), is linearly proportional to the ratio (x-xo)/(xi-xo).
Therefore,
Pc(x) = P(X0) + [p{*i) - P{*o)]
Oi-*o)
Three-dimensional trilinear interpolation consists of seven linear interpolations on the
sub-cube depicted in Figure 5. This results in the following equations [5]:
Ax = x - xo
Ay=y-y0
Az = z-zo
Co =Pooo
ci = (pioo -pooo) / (xi - x0)
C2 = (poio -pooo) / (yi -yo)
C3 = (P001 -POOO) / (Zl - Zo)
c4 = (pi io -poio -pioo + Pooo) / [(xi - x0) (yi - yo)]
C5 = (Pioi -pooi -Pioo +P000) / i(xi -xo) (z\-zo)]
C6 = (pon -Pool -Poio +P000) / [(yi -yo) (zi-z0)]
cj = (pm -pon -pioi -pi10 + Pioo + Pooi +Poio-Pooo) /
[(xi - x0) (yi -yo) (zi - z0)]
p(x,y,z) =c0 + cjAx + c2Ay + c3Az
+ c4AxAy +
c^AxAz + C(,AyAz + CjAxAyAz
24
5 Software Implementation
Software programs implementing the Shepards, Moving Matrix, and ICI algorithms were
written in the C programming language. The programs read data files that contained the
forward printer map and the gamut-mapped input points of the inverse map. The
programs executed the algorithms on this data to compute the inverse for the given input
points; this output data was written to separate data files for later analysis. The forward
printer map was represented by two data files: one had
133 CMY entries, and the other
had the corresponding
133 LAB entries. This experimental printer data was obtained
from Dr. L. K. Mestha from Xerox.
5. 1 Algorithm Metrics
These software implementations and simulations provided relative accuracy, execution
time, and computational complexity information for each algorithm.
5.1.1 Determining Algorithm Complexity
The complexity of the algorithms was measured objectively by including code to count
the number of multiplication/division and square root operations performed to compute
the inverse. These operations are costly in hardware because the modules to perform
them are both large and slow as compared to other operations. The algorithms were also
studied from a subjective standpoint to ascertain a general idea of the number of required
25
arithmetic modules, the amount of parallelism that might be exploited, and the
complexity of the required control logic. A combination of the objective and subjective
complexitywas used to rate each algorithm's overall complexity relative to the others.
5.1.2 Determining Algorithm Execution Speed
The csh command interpreter running on the Unix platform has a built-in utility that can
be used to determine the amount of time a program is executing on the system. The
execution times of the programs were measured using this utility, which is an executable
named time. In order to form a valid comparison, the programs were executed on the
same machine under the same environment. Program development and execution was on
Rochester Institute of Technology's Grace computing system. Grace is an Alpha 4100
5/533 with 3 EV5.6 CPUs. It has 1.5GB ofmemory and is running Tru64 UNIX, version
4.0F.
5.1.3 Determining Algorithm Accuracy
The programs computed the inverse for the
133
gamut-mapped LAB points x using the
forward printer map data. The completion of the programs resulted in a data file
containing
133 CMY entries that defined the inverse y :
y = P~\x)
26
Each point 5>;was interpolated through the forward printer map P to find the resulting zj.
The differences AEj between zy and the corresponding xj such that jk= F'(xj) were
computed. This can be shown graphically as:
"j
*,. >yj >ZJ
AEj
'
The mean error and standard deviation, the minimum error, and the maximum error were
calculated from the AEj - these statistics represented the accuracy of the algorithms. It
should be noted that these error statistics were computed using the node points of the
inverse map as inputs. The accuracymetrics of each algorithm were used to compare the
algorithms in terms of accuracy. Separate programs were written to perform the
interpolation through P and to compute the accuracy metrics; these are also included on
the CD accompanying this paper.
5.2 Optimal Parameters
Each algorithm was executed several times while varying the algorithm parameters.
Accuracy metrics for each trial were measured to gain insight into how the parameters
affect the accuracy. These trials also led to the values of the parameters that result in the
best accuracy. Tables showing the accuracy data for different parameter values for each
algorithm are shown in the Appendix (see Section 9.1). The results of these trials are
briefly described next.
27
5.2.1 Shepard's Algorithm
For Shepard's algorithm, there are the two variable parameters/* and ju. The/? parameter
specifies how the distances between points are calculated. The trials showed that the
value of p does not have much affect on the accuracy. Therefore, p = 2 was taken
because this is the most convenient as it results in a distance measurement that is simply
the Euclidean distance. The /j. parameter affects the locality of the weighting function.
The simulation results show that ju = 5 provides the best accuracy.
5.2.2 Moving Matrix Algorithm
The variable parameters in the Moving Matrix algorithm are ju and e. Similar to the same
parameter in Shepard's algorithm, the // parameter affects the locality of the regression.
Simulations showed that ju = 5 resulted in the best accuracy. The e parameter does not
have a significant affect on the accuracy and is taken as a small value to avoid ill-
conditioning of the Si matrix. In these simulations, fwas taken as
10"
.
5.2.3 ICI Algorithm
The ICI algorithm has parameters e, kmaXt 5, and /j.. s and kmax are the stop criteria; s
specifies the error threshold at which the update iterations can stop, and kmax is the
maximum number of iterations that algorithm will perform when computing the inverse
for a given point. The parameter 8 for this algorithm specifies the perturbation used in
the numerical differentiationmethod used to compute the gradient matrix. The parameter
28
fu should be selected in order to achieve fast convergence and to meet accuracy
requirements.
The software simulations showed that the mean error for this algorithm can be reduced by
decreased the error threshold s. In order to achieve fast convergence, swas taken as 0.5.
This is below the "just noticeable difference" ofAE = 1.0. Other simulations have shown
that even better accuracy can be obtained by further reducing s [4]. The software
simulations also showed that the best choice of kmax is 50. Increasing this parameter
beyond 50 does not result in substantial gains in accuracy; if the algorithm is going to
converge for a given input point, it will do so within 50 iterations. The perturbation 5
was determined to not have any measurable affect on the accuracy of the algorithm; 5
was taken to be 10, which is approximately a 4% variance of the printer's input color
components ( 0 < C, M, Y < 255 and 10/255=0.04) and is more or less arbitrary. Finally,
the simulations showed /j. = 4 to be a good choice in terms of accuracy and convergence.
Table 1 shows the optimal parameters found for each algorithm.
Table 1: Optimal Algorithm Parameters
Shepard's P = 2
/j = 5
MovingMatrix
=10"4
ICI 5=0.5
5=10
M = 4
29
5.3 Simulation Results
The accuracy, speed, and complexity of each algorithm is summarized in Tables 2, 3, and
4, respectively. The metrics are obtained from the simulations using optimal algorithm
parameters and operating on the same data and on the same machine in the same
environment. All measurements are made for the algorithms computing the
133 inverse
map from the
133 forward map.
Table 2: Algorithm Accuracy
Mean
AE
Std
Dev
Mean +
2*StdDev
Min Max
Shepard's 1.64 0.54 2.73 0.15 3.81
MovingMatrix 1.28 2.29 5.86 0.02 54.88
ICI 0.39 0.12 0.63 0.05 1.68
Table 3: Algorithm Execution Time
Execution Time (sec)
Shepard's 12
Moving Matrix 17
ICI 7
Table 4: Algorithm Complexity
Shepard's LOW
MovingMatrix HIGH
ICI MEDIUM
ICI is clearly the winner in terms of accuracy and execution speed. A brief discussion of
each algorithm's complexity follows.
Moving Matrix has significantly more multiplications than Shepard's and approximately
five times more multiplications than ICI; this would lead to a comparatively large and/or
30
slow design. Moving Matrix also would require complex control logic to perform matrix
operations such as matrix inversion.
Shepard's algorithm would be quite easy to implement because it requires a small
number of resources and is data flow oriented thus requiring very little control logic.
However, there are a large number ofmultiplications (three times that of ICI), which will
result in a slow design. Even if more resources were used to perform some of the
processing in parallel (which would increase the design area), it is still unlikely it would
beat out ICI in terms of speed. Although easy to implement, Shepard's is inaccurate and
would most likely be slower in hardware than ICI.
ICI has the smallest number of multiplications, which could result in a less complex
circuit. However, ICI has two significant drawbacks. First, this algorithm requires
trilinear interpolation, which itself is a non-trivial hardware implementation. Second, ICI
will require control logic for computing the gradient matrix and for computing the initial
estimate by the clustering technique.
Of the three algorithms, ICI is the best in terms of speed, accuracy, and hardware
complexity. This algorithm was chosen for hardware implementation in VHDL. The
results of this phase of the project are presented in the next section.
31
6 ICI Hardware Implementation
NOTE: All source VHDL files are located on the accompanying CD under the
"Hardware/VHDL Source Code" folder. The area and timing reports for each sub-
module described in this section can also be found on the accompanying CD under the
"Hardware/Reports" folder. VHDL source code is also shown in the Appendix (see
Section 9.2).
6. 1 Design Methodology
The algorithm was implemented using VHDL. The design was simulated with Mentor
Graphic's ModelSim; the simulation results were compared with results obtained from
software models operating on the same input data. Correct functionality of the design
was guaranteed since the hardware simulation results matched those of the software
models.
The hardware modules were synthesized using Synopsys's Design Compiler and was
targeted to a generic library provided with the Synopsys's tools. The resulting synthesis
results provide a net-list that can be used for ASIC fabrication.
Several important design decisions were made about the system partitioning and
architecture, data storage, and data representation. These decisions will be described in
the following sections.
32
6.1.1 System Partitioning and Architecture
The ICI algorithm makes extensive use of trilinear interpolation. Trilinear interpolation
is a geometric method to calculate the output of a 3D function at a given input point,
where the 3D function is defined by a finite set of input and output pairs. A detailed
discussion of trilinear interpolation can be found in Section 4.1.4. An entirely separate
module was designed to perform trilinear interpolation; this module is used by the ICI
design and can also be used in other applications.
Both the trilinear interpolation module and ICI module were designed using datapath
blocks and control blocks. This approach allows for highly controlled use of the
arithmetic units that make up the datapaths. The advantage of having such control over
these units is that operations can be pipelined and the units can be shared where
appropriate.
6.1.2 Datastorage
There are six values for each entry in the forward printermap P; one for each of the input
space components and one for each of the output space components. For example, if the
P defines a mapping from CMY to LAB, then there is a value for C, M, and Y for the
input color space, and a value for L, A, and B for the output color space. An
N3
look-up
table requires
6N3 bytes of storage. For N=13, this results in 13,182 bytes of required
storage.
33
Synthesis tools are still incredibly inefficient at synthesizing memory; therefore, this
design assumes that an external memory module exists that contains P. The memory
module in Figure 7 is assumed for design purposes. The interpolation and ICI modules
are designed to interface with this memory module.
C> data out
Figure 7: MemoryModule Diagram
6.1.2.1 Operation
The memory module is synchronous and clocked by the elk input signal; all reads and
writes occur on the rising edge of elk and complete in one clock cycle. The operation of
the module is described below.
(a) Read: Reads occur when wr is low; data_out becomes the value stored at the
location specified by addr.
(b) Write: Writes occur when wr is high; the location specified by addr is stored
with the value specified by data_in, and data_out is the same value as data_in.
34
(c) Memory location: addr specifies the memory location to read from during
reads and the location to write to during writes. Its value can range from zero to
6N3-1.
The interpolation and ICI modules that interface to this memory never write to it; the wr
and data_in signals are included for completeness.
6.1.2.2 Memory Data Layout
It is assumed that the memory module contains the foward printer look-up table prior to
using the functionality of the interpolation and ICI modules. Furthermore, it is assumed
that the values are arranged in the memorymodule in the following fashion:
Memory Location Contents
0 Co
1 M0
2 Y0
3 Lo
4 A0
5 B0
6 Ci
7 Mi
8 Y,
9 Li
10 A,
11 Bi
35
6.1.3 Data Representation
The project specification required that two decimal places of accuracy be retained for the
LAB values. This meant that the design would have to implement floating point
operations. Floating point hardware is more complex and slower compared to fixed point
hardware. Additionally, the design resources did not have floating-point libraries that
contain pre-compiled and optimized floating-point modules. These libraries would either
have to be purchased, or a significant amount of time and effort would be required to
design custom floating-point arithmetic modules.
An alternative is to use fixed point arithmetic and scale all of the data to integers. The
approach allows for the use of pre-compiled and optimized fixed point integer modules,
which simplifies the design and reduces design time.
To preserve two decimal places, the data had to be scaled by a factor of 100. For
example, 123.45 would be scaled as 123.45*100 = 12345. Integer addition and
subtraction would occur normally. However, multiplications would require the result to
be scaled down. This can be shown by the following equations:
Original Operation: A * B = AB
With Scaled Data: ( 1 00A) * ( 100B) = 1 0000AB
However, we onlywant 100AB, which means this result must be scaled down by a factor
of 100. This requires the result to be divided by 100. This is a disadvantage of using
36
scaled fixed point arithmetic; multiplication operations must be followed by division.
This disadvantage is a fair trade-off compared to the complexity involved in
implementing a design to support floating-point numbers.
It was necessary to determine ifusing this integer-based approach would compromise the
accuracy of the ICI algorithm. The ICI C language implementation files were modified
to use only integer data in order to investigate the issue. The integer based ICI C code is
located on the accompanying CD under the "Software/integer-based
ICI" folder.
Simulation results showed that the accuracy of the integer-based implementation was
equal to its floating-point-based counterpart.
6.2 Synthesis Methodology
The control blocks of the trilinear interpolation and ICI modules are the only
synchronous (clocked) blocks in each of the designs. These blocks were synthesized for
speed, and the maximum path delay through them defined the clock and maximum
operating frequency for the entire design. All other blocks are asynchronous, and their
maximum path delays are either shorter or longer than the defined clocked. For those
blocks that have a shorter maximum path delay, there is no concern because the circuit
will provide valid results within the clock period. For those blocks that have a longer
maximum path delay, the control blocks must either wait the appropriate amount of clock
cycles for the results to become valid or continue on with processing. In the latter case,
computation in the block would continue as the control block would proceed with other
computations using other datapaths. Results would become valid at some later point in
37
the logic flow. This concept of parallel computation motivated the design of several
separate datapaths that perform different functions. Wherever this type of parallelism
could not be exploited, the control block simply waits the appropriate amount of clock
cycles for valid data.
There were no design specifications, such as area or speed, provided by the project
requirements for the hardware implementation. However, the design has application in
real-time and near real-time applications. It was also designed for ASIC implementation
in mind. For these reasons, the design blocks were synthesized for the fastest possible
performance. This is a reasonable synthesis constraint since the goal is real-time
computation, and the ASIC platform allows for larger and more complex designs as
compared to programmable logic, such as field programmable gate arrays (FPGA).
The asynchronous blocks were constrained by specifying input and output delays equal to
the defined clock. These constraints forced the synthesis tool to generate the fastest
possible design, but one that would never meet the timing constraints because of the
complexity of the operations being performed. Several of the synthesized
blocks'
timing
reports clearly show that they do not meet timing constraints. This is acceptable because
in this design methodology it is understood that these blocks require more than one clock
cycle to complete.
38
6.3 Simulation & Testing
The VHDL code was simulated using Mentor Graphic's ModelSim tool. Testbenches
were written in order to test and verify correct functionality of the hardware design.
These testbenches read input data from files, fed it to the module under test, and wrote
the output data into another data file. Other inputs were also controlled by the
testbenches; these inputs, such as algorithm parameters, were the same as those used in
the C language software simulations. The output data files written by the testbenches
were compared to the output data files written by the C software models. Correct
functionality of the VHDL design was guaranteed because the data contained in the
output files matched. The VHDL source code for the testbenches can be found in the
Appendix (see Section 9.2).
39
6.4 Hardware Modules
6.4.1 Trilinear Interpolation Module (trijop.vhd)
Figure 8 shows the I/O signals of the trilinear interpolation module, and Figure 9 shows
the block diagram of the module. Input and output signals are described in Table 5.
I -8
Figure 8: Top Level Diagram ofTrilinear InterpolationModule
40
O **t
5f AR 'f
'H3t'8 O-
"<? i ., erst.
r ADS?
-Oa;.
c
-f**>2atj: as
-Ozra: 2
.TERfc.
Figure 9: SubmoduleDiagram ofTrilinear Interpolation Module
41
Table 5: Trilinear Interpolation Module I/O Signals
Signal
Name
I/O Type Description
Clk Input stdlogic System clock.
Memin Input std_logic vector[31:0] Input data from external memory.
Start Input std_logic Interpolation operation begins on the rising
edge of this signal.
XO Input std_logic_vector[3 1 :0] Specifies the first component of the input
point.
XI Input std_logic_vector[3 1 :0] Specifics the second component of the
input point.
X2 Input std_logic_vector[3 1 :0] Specifies the third component of the input
point.
Done Output stdlogic Signals the completion of the interpolation
operation and valid output.
Error Output std_logic Signals an error occurred during the
interpolation operation. Note: An error can
occurs if the input point specified by XO,
XI, and X2 is located outside of the input
space defined in the LUT.
Memaddr Output std_logic_vector[3 1 :0] Specifies the location of the desired data in
memory.
ZO Output std_logic_vector[3 1 :0] First component of the output value; valid
when done=l and error=0.
Zl Output std_logic_vector[3 1 :0] Second component of the output value;
valid when done=l and error=0.
Z2 Output std_logic_vector[3 1 :0] Third component of the output value; valid
when done=l and error=0.
6.4.1.1 Operation
The system clock must have a period of 40 nanoseconds or greater; this corresponds to a
maximum operating frequency of 25 MHz. Simulation results show that this design can
perform trilinear interpolation for an input point in 2.5 ps when operating at 25 MHz.
42
The rising edge of start begins interpolation at the input point specified by signals XO, XI
and X2. Done is asserted when interpolation is complete. If error is not raised, valid
output occurs on signals ZO, Zl and Z2. Memin and memaddr are interface signals to
the external memory module. Memin is data coming from the memory; memaddr is the
location in memory of desired data and is controlled by the interpolation module's
control block.
6.4.1.2 Sub-blocks
6.4.1.2.1 TRI_CTRL_BLOCK (trictrl.vhd)
This block contains the control logic needed to carry out the interpolation. Its input and
output signals interface to the other sub-blocks and to the external memory module to
control data flow. It is also responsible for beginning the interpolation when start is
raised and for asserting done and error when the interpolation is complete.
The top level diagram is shown in Figure 10. A lower level block diagram is not
available because it contains too many low-level components (primarily registers) and
interconnects to be displayed on paper.
43
ADDl.^KS.uLTtj'. *21
c^O-
fCHlNI3iai
sta*- rj>-
SUB1.RESULT131 i!3
SuB2.,.RE5Ui.7i3la
?>jy:i..RESULT'3! JZ|
T. *E.SU-7t3!81
x*r3! ;?i
x.H3l>81
X2C21 :8.
OADDlAt:ji .a)
OAHJ1Bl3l=ai
-ODO
-t>&SOS?
4J*^Su8lAt31tai
OsuBib;^: 31
-^:>wH2A;21 -21
Os-^;'::i: -*8J
f3>s.u:i:tA:a: a;
^g^P*&Jl&4 : ' . '&>.
f^T*I-MXnJB>-334l (31 z:
HD^'^i A.j.^ ;>>_BC!3l :SI
OtI.A0aR_SP_IN3 131 >B!
PTgi-ADaajy-m !3i -0!
P TgX-AICTJP 3>ft 131 Bi
O"^X-ADOR -DP.JCNB 1 3 1 m
"O TKI..ADLK_0>*_I>i/ *3i m
">Uli3! <|]
O
"-
-CJI3I :J
-5*T_C2: 31 '81
4S>T-X3 5 3!:ii!;
<0*..r^!31 'Bi
f^1"-CSi3i -a.
4Q>Tj:8f3t k-
|">T.X?!3! :2;
f^".. ;;xf3l ;!
f^TJjriSI =81
>0"_D2t3I^l
-0/;i3t <bi
0/tni:^.
n -gi
Figure 10: Top LevelDiagram ofTrilinear Interpolation TRI CTRL Block
The area of this block is 18,148 cells. The maximum path delay through this block is
39.06 nanoseconds.
44
6.4.1.2.2 TRI ADDR BLOCK (triaddrblock.vhd)
This block defines the datapath for calculating the memory address of a desired value in
memory. The data needed from memory are the lattice points of the cell containing the
input point and their corresponding points in the output. These lattice points are located
by indices which were determined by searching the input space for the appropriate cell
containing the input point. These indices are input to this block which computes the
memory address of the desired values.
The top level diagram showing input and output signals of this block can be found in
Figure 1 1 . A low level block diagram is shown in Figure 12.
INZt3l"81 D*-
!N3!3I '0)O-
33N5131-81 O"
1NBI3I :'B'CJ>"
"C^r 131-01
T'" *>
Figure 11: Top Level Diagram ofTrilinear Interpolation TRI_ADDR_BLOCK
45
Figure 12: Datapath Architecture ofTRI ADDRJBLOCK
The area of this block is 20,014 cells. The maximum path delay through this block is
38.87 nanoseconds.
6.4.1.2.3 TRI CX BLOCK (tri_cx_block.vhd)
This block defines the datapath for calculating the CI through C7 values (henceforth
known as CX values) used to calculate the output. These values are calculated from the
lattice points of the cell containing the input point. Only three subtractors and one adder
46
is needed to compute all seven values because these units are shared due to pipelining of
operations by the control block.
The top level diagram showing input and output signals of this block can be found in
Figure 13. A low level block diagram is shown in Figure 14.
Ar;31A[31-'B3
a::::ibi2: -ei f^>-
5l.;B1Ani:g)
SUB2A[3! <gj
sugzB'3*
-b'i
&J33A I'J 1 : ii
St63Bl3t s^J
A:.:-Dl_??rSULTl31rgI
5u3! ??fcSJLTtjt =81
SUB2_RESULT{31 ^81
-OstS3_SESULT t31 -2!
TI_CX_3_
Figure 13: Top Level Diagram ofTrilinear Interpolation TRI CX BLOCK
Figure 14: DatapathArchitecture ofTRI_CX_BLOCK
47
The area of this block is 3,658 cells. The maximum path delay through this block is
10.36 nanoseconds.
6.4.1.2.4 TRI TERM BLOCK (tritermblock.vhd)
This block defines the datapath for calculating the interpolated values; it is the workhorse
of the interpolation module. It uses the CX values produced by the TRICXBLOCK
and several other inputs provided by the control block to compute the interpolated values.
The top level diagram showing input and output signals of this block can be found in
Figure 15. A low level block diagram is shown in Figure 16. The disadvantage ofusing
scaled data can be seen in this low level diagram; there are divide units throughout the
datapath that are needed to scale down the results of the multiplication-rich calculation
performed by this block. These divides units add significant area and delays to this
block.
C1131 -?fj
CI 1.31 -82 O*"
w 2 t 3 * ''$>% \m^m
C4C31 =ai >*
C5 f 3 1 ; 2 1 |m|3>
Cnt31 %]
DX131 -ei
Iff j 'j J ; g 1
'Jl. I .J i <0 i |[:...:|dP"
RE9ULTC31 =81
Figure 15: Top Level Diagram ofTrilinear Interpolation TRI_TERM BLOCK
48
COP 1:0]
Cl[31
DX[31:
C3[31:0]~
C4[31:0]
C5[31:0]-
ESULT[31:0]
Figure 16: Datapath Architecture ofTRIJTERM BLOCK
49
The area of this block is 148,537 cells. The maximum path delay through this block is
445.6 nanoseconds.
6.4.1.3 Summary ofResults
It is important to note that the multiply units in the sub-block designs operate on 32-bit
integer input; the 64-bit result is truncated to the 32 most significant bits, which can result
in truncation error.
Table 6 summarizes the area and timing results of the synthesized blocks for the trilinear
interpolation module. Note that TRITERMBLOCK requires multiple clock cycles to
produce valid output. During this time, the control block will be exploiting parallelism
by continuing with other processing, if possible. If not, it will simply wait the required
number of clock cycles for valid data.
Table 6: Trilinear Interpolation Module Synthesis Results
Block Area (cells) Timing (ns)
TRI CTRL BLOCK 18,148 39.06
TRI ADDR BLOCK 20,014 38.87
TRI CX BLOCK 3,658 10.36
TRI TERM BLOCK 148,537 445.6
Total 190,357
50
6.4.2 ICI Module (icijop.vhd)
Figure 17 shows the I/O signals of the ICI module; Figure 18 shows the block diagram of
the module. Table 7 describes the input and output.
o-
1NVDEL
X.~.i i. u i * id j
EP131 --21
)UDC31:3)
:\VDEL_TWDi3I:BJ
K_MA.X[31 :8J
MEH1N131 =Bj
1_.H1_S0 31=01 O-
Pt^LCX.BOUND! 31 =0
P2_HI_B0UNDI3 :(dJ
P2_L0. 0 [ 3 1 : R 1
'3_HI_B0UN[
P......SIZEI-
1 : 0]
: 8
Afi
' o-
TRI_DONE O-
TRI.ZaOl =01
TRI_Z1 131 ; 8 ]
TRI_Z2f31 :3]
Xi {31 ; 0 3
X2 { Jl : 0 ;
X3I31 -2]0
"O
jtbi31 -a:
ii'SV i i
'
"13 1 ; 0 J
ICI.
_j -v, *
2131 :0J
_ADDR [31 ; 83
_MUX_SEL
P T c;
4"
^k' k
R 1X0 131 < B 3
\ X_X I L CJ i B J
"Rl X2 [31 : :
Figure 17: Top Level Diagram of ICIModule
51
c...0
DEL fir, -z
Efrn'^iO
If.
nO--
XNV0EL.~QUD131 "81
rv.--a n*i3i>Bj|
njf3l;0?O--
P l H1...B0;j^0 I 3 1
Pt_LD_BCUNDC31
P2HX_BOUWH31
s2_L0..,.8OU* ! 3 1 a?0+
?3_HI_SOyN0 ' 3 1 =Rl>- -
P3-LD. 30UN0 ; 3 1 : 2i^ -
P.3.LZE< 111 -BlO--
SfAsrf2>
'S3.-Z8C3!B.>04-
mi. t\ j5i.z;||3>--
TSi...Z2!ai mfmp--
XllTi =ii|I>4"
X2l31BlI
X313! KiOH*
rCI.CT8L
- nrs.D"
ICI.11?
-OQ3NE
-HJJ> IW_CUT8 131 ? 5
-J3>-^v,..cuTi ta: : 21
-O;\v..0u-2S3i'2i
O'^J
-OTsi_xar3!..n
-|3>trx.x; :ai:Bi
Figure 18: Submodule Diagram of ICI Module
52
Table 7: ICIModule I/O Signals
Signal Name I/O Type Description
Clk Input Stdlogic System clock.
Start Input Std_logic ICI operation begins on the rising
edge of this signal.
Memin Input std logic vector[31:01 Input data from external memory.
p_size Input std_logic_vector[3 1 :0] Specifies the number of rows in the
LUT, where each row contains the
three components of a point in the
input space and the three components
of the corresponding point in the
output space.
XI Input std_logic_vector[3 1 :0] Specifies the first component of the
input point.
X2 Input std_logic_vector[3 1 :0] Specifics the second component of
the input point.
X3 Input std_logic_vector[3 1 :0] Specifies the third component of the
input point.
Del Input std_logic_vector[3 1 :0] Specifies the delta amount to use for
calculating gradient matrix
derivatives.
Invdelowd Input std_logic_vector[3 1 :0] Specifies the multiplier constant to
use when calculating a forward OR
backward gradient matrix derivative.
Note: Must be equal to 10,000/del
Invdel_twd Input std_logic_vector[3 1 :0] Specifies the multiplier constant to
use when calculating a forward AND
backward gradient matrix derivative.
Note: Must be equal to 10,000/2*del
Kmax Input std_logic_vector[3 1 :0] Specifies the maximum number of
iterations to stop ICI iteration.
Ep Input std_logic_vector[3 1 :0] Specifies the threshold to stop ICI
iteration.
Mu Input std logic_vector[31:0] Specifies the value for u used in the
ICI update equation.
Pl_lo_bound**
Input std_logic_vector[3 1 :0] Specifies the lower bound of the first
component of the input space.
Plhibound Input std_logic_vector[3 1 :0] Specifies the upper bound of the first
component of the input space.
P2_lo_bound Input std_logic_vector[3 1 :0] Specifies the lower bound of the
second component of the input space.
P2_hi_bound Input std_logic_vector[3 1 :0] Specifies the upper bound of the
second component of the input space.
P3_lo_bound Input std_logic_vector[3 1 :0] Specifies the lower bound of the third
component of the input space.
53
P3 hi bound Input Std_logic_vector[3 1 :0
]
Specifies the upper bound of the third
component of the input space.
Tri zO Input Std_logic_vector[31:0
]
First component of an interpolated
value from the trilinear interpolation
module.
Tri zl Input Std_logic_vector[3 1 :0
]
Second component of an interpolated
value from the trilinear interpolation
module.
Tri z2 Input Std_logic_vector[3 1 :0
]
Third component of an interpolated
value from the trilinear interpolation
module.
Tri done Input Std_logic Signal from the trilinear interpolation
module indicating interpolation is
complete.
Done Output Stdlogic Signals the completion of the ICI
operation.
Memaddr Output Std_logic_vector[3 1 :0
J
Specifies the location of the desired
data in memory.
Inv outO Output Std_logic_vector[3 1:0
J
First component of the output value;
valid when done=l .
Inv outl Output Std_logic_vector[3 1 :0
J
Second component of the output
value: valid when done=l.
Inv out2 Output Std_logic_vector[3 1 :0
J
Third component of the output value;
valid when done=l .
Tri start Output Stdjogic Signals the trilinear interpolation
module to begin operation.
Tri xO Output Std_logic_vector[3 1 :0
J
First component of the input value to
the trilinear interpolationmodule.
Second component of the input value
to the trilinear interpolationmodule
Tri xl Output Std_logic_vector[31:0
J
Third component of the input value to
the trilinear interpolationmodule.
Tri x2 Output Std_logjc_vector[31:0
J
* Unlike the other input parameters to this module, k_max is NOT a scaled value. K_max specifies the
maximum number of iterations used to compute the inverse.
; For example, if the input colorspace was Lab:
Pl_lo_bound = 0 and Pl_bi_bound=100
P2_lo_bound = -127 and P2_hi_bound=+127
P3 lo bound = -127 and P3_bi_bound=+127
because 0<L<100
because-127<a<+127
because -127 < b < +127
54
6.4.2.1 Operation
The system clock must have a period of 40 nanoseconds or greater; this corresponds to a
maximum operating frequency of 25 MHz. Simulation results show that this design can
calculate the inverse for a given input point in 1 .3 ms when operating at 25 MHz. A
higher maximum operating frequency could be achieved if the design was targeted to a
vendor-specific sub-micron technology library. This would reduce the time required to
compute the inverse.
The rising edge of start begins ICI computation of input point specified by signals XI, X2
and X3. Done is asserted when computation is complete. The output components occur
on signals invoutO, invoutl and inv_out2. Memin and memaddr are interface
signals to the external memory module. Memin is data coming from the memory;
memaddr is the location in memory of desired data and is controlled by the interpolation
module's control block.
6.4.2.2 Input Description
The inverse printer map has an input space defined by equally subdividing the output
space off. This creates a
F1
with a structured input which allows for fast and efficient
interpolation. The drawback to this approach is that the input space will contain points
that are outside of the gamut defined by P. These points which are outside of the gamut
typicallywill not have an inverse, which results in a poorF .
55
In order to obtain a better inverse map, points which are outside of the gamut are first
mapped onto the gamut boundary. These "mapped" values will typically have an inverse,
which result in a much better approximation of the inverse. When using ICI to compute
the inverse, the "mapped" input points should be used. A more detail discussion of
gamut mapping is found in Section 2.3.
6.4.2.3 Sub-blocks
6.4.2.3.1 ICIJCTRL (ici_ctrl.vhd)
This block contains the control logic needed to carry out the ICI computation of the
inverse. Its input and output signals control data flow by interfacing to the other sub-
blocks, to the external memory module, and to the trilinear interpolation module. It is
also responsible for beginning the inverse computation when start is raised and for
asserting done when complete.
The top level diagram is shown in Figure 19. A lower level block diagram is not
available because it contains too many low-level components (primarily registers) and
interconnects to be displayed on paper.
56
c>CLX
SEL13UK3
IC2...DP.NEXr,VA:... ri* 8J
:NVDE:....TM3*3t 8!>i
MUKH'il
mxsm \v, '-. J>-
'i,..acuKci:H. :g; 5*
>3JL0.BOAOI:>1 -II
P-SIZEMre'.O-
TK'l,..2g|:s-
:8'l !*?>
XI 13! :8] O"
X2i31 ig] g>
K3I3! >}} J3>"
PI.
"O *DD_CALC-.2N! 13: g :
Aoa.c_c-ai3: ai
2>ado* C*LCi a t a : : 8 2
QJCM.C-XH31'8}
03.r,v.X.J{2S3i ?,
.calx.
??,: ji *f
>" :*
'
-2131*81
"C>rCI_DP_CU*V*_ 133 Jt
PXCI-Q^ . swa ::r.: a i
f5CI_.,.tMiKi. 131 * I i
>Oici....>?jnyi3!
8'
C>it:..^.xai.ai <-..
ICr_0P._*<H3! =83
OTCT..C-.~.:3T 2:
"C>ICiaF_.rjr3! 81
Ciw-xamn3i<ai
C>r^vjxrrir3i8i
0:;::
.xxgxai
Figure 19: Top Level Diagram of ICI_CTRL Block
The area of this block is 43,359 cells. The maximum path delay through this block is
39.20 nanoseconds.
6.4.2.3.2 ADDR CALC (addrcalc.vhd)
This block defines the datapath for calculating the memory address of a desired value in
memory. Two values provided to this datapath are the index for a given entry in memory
57
and the offset of the value in that entry. These values are controlled by the ICI control
block.
The top level diagram for this block is shown in Figure 20. The lower level block
diagram is shown in Figure 21 .
:\H3i-rj
- 2 1
Figure 20: Top Level Diagram of ICI ADDR_CALC Block
Figure 21: Datapath Architecture of ICI ADDR_CALC Block
58
The area of the block is 6,608 cells. The maximum path delay through this block is 31.37
nanoseconds.
6.4.2.3.3 DIS DP (disdp.vhd)
This block defines the datapath needed to compute the Euclidean distance between two
points. The distance between two points is required at several points in the ICI algorithm
such as when computing the initial estimate and when computing the difference between
current and previous computed output values to determine when to stop iterating.
The top level diagram for this block is shown in Figure 22. The lower level block
diagram is shown in Figure 23.
..,XjTI31-^l
Figure 22: Top Level Diagram of ICI DIS_DP Block
59
x*wii
yi.v. ?i>
fii'r.-.tif^t
?;[^
'A:.^^. ,. fnilnrtlll -32
Figure 23: Datapath Architecture of ICI DIS_DP Block
The area of the block is 23,122 cells. The maximum path delay through this block is
183.46 nanoseconds.
60
6.4.2.3.4 ICI DP (icidp.vhd)
This block defines the datapath that implements the update equation of the ICI algorithm.
Inputs to this block are controlled by the ICI control block.
The top level diagram for this block is shown in Figure 24. The lower level block
diagram is shown in Figure 25.
OJRSVA: r 3 1-0 1 HJJjN
GVE131 -01 O""
CX-lt31 = ! f^-
ai2r:u -'8! O-
X,V..; . 13! a I O-
*x i3i;aiC^ TCI OP
X2I31.1I |j^-
^.,%? >*&..
-'fJ'''
nvn-w. \Z&
X2i:i! -81 f^>-
risiai -0i C>-
tnaisii f^
^2 131*81O*
^C^fCXTV^Ltli
Figure 24: Top Level Digram of ICI_DP Block
61
Figure 25: Datapath Architecture of ICI_DP Block
The area of the block is 36,469 cells. The maximum path delay through this block is
159.96 nanoseconds.
62
6.4.3 Summary of Results
Again, note that the multiply units in the sub-block designs operate on 32-bit integer
input; the 64-bit result is truncated to the 32 most significant bits, which can result in
truncation error.
Table 8 summarizes the area and timing results of the synthesized blocks for the ICI
module. Note that the DISDP and ICI_DP blocks require multiple clock cycles to
produce valid output. During this time, the control block will be exploiting parallelism
by continuing with other processing, if possible. If not, it will simply wait the required
number of clock cycles for valid data.
Table 8: ICIModule Synthesis Results
Block Area (cells) Timing (ns)
ICI CTRL 43,359 39.20
ADDR CALC 6,608 31.37
DIS DP 23,122 183.46
ICI DP 36,469 159.96
Total 112,683
6.5 System Overview
Figure 26 illustrates how the interpolation and ICI modules would be used in a real
system; the arrows represent communication channels. As discussed previously, an
external memory module would contain the data defining the P look-up table.
Communication channels would exist between the interpolation, ICI, and external
memory modules. Operating at the maximum frequency of 25 MHz, the system can
construct the inverse printer map
P'1 for a 133 forward map P in approximately 2.9
63
seconds (13 * 1.3ms). The computation time could be reduced if the modules operated
with a higher clock frequency. This could be achieved by synthesizing this design to a
vendor-specific sub-micron technology library.
X
(LAB)
Z> ICI Module
7^>
iz:
C> (CMY)
External Memory
Module
(P LUT)
zy
iz.
Trilinear
Interpolation
Module
Figure 26: Diagram of System to Compute P"1
Also note that the trilinear interpolation module can be used alone with an external
memory containing an existing
P'1
as illustrated in Figure 27. This configuration would
provide a method of interpolating image data through
P~
. Operating at the maximum
frequency of 25 MHz, this system could interpolate a 512x512 image in approximately
0.66 seconds (512*512*2.5ps). Again, this computation time could be reduced if the
design was synthesized with a sub-micron technology library.
64
External Memory
Module
(F1
LUT)
X
(LAB) $>
:zy
iz
Trilinear
Interpolation
Module
k> (CMY)
Figure 27: Diagram of System to Interpolate Image Data
7 Concluding Remarks
7. 1 Conclusions
In this thesis project, three algorithms that compute the inverse of a forward printer map
have been benchmarked. The algorithms under investigation were the Shepard's
Interpolation, Moving Matrix, and Iteratively Clustered Interpolation (ICI) algorithms.
The software simulations showed that the ICI algorithm was clearly the best algorithm in
terms of accuracy and speed. A review of the algorithm and an initial hardware
architecture to implement the ICI algorithm showed that this algorithm's complexity does
not far exceed the complexity of the other two algorithms.
The ICI algorithm was implemented in VHDL and synthesized to a generic library. The
area and timing results of the synthesized design showed that this algorithm can be
implemented in hardware to target an application specific integrated circuit (ASIC). The
VHDL source code and synthesized netlists provide a working design. Synthesizing to
65
the Synopsys generic library resulted in an ICI module containing 112,683 cells and a
trilinear interpolation module containing 190,357 cells. The timing of the modules
resulted in a maximum operating frequency of 25 MHz, corresponding to a 40
nanosecond clock period. Higher frequency of operation could be achieved if the design
was synthesized to a deep sub-micron CMOS technology library.
7.2 Future Work
The future work of this project would take the hardware implementation further. The
design can be mapped to a specific and current target technology library. Next, place and
route would be performed for the design. Then, a real ASIC could be fabricated.
Ultimately, the ASIC would be embedded into a printer's electronics. A printer with an
inline sensor that measures output color, along with this embedded ASIC, would provide
themajor components needed for an automatic and real-time color correction system.
7.3 Acknowledgements
I would like to thank my committee members, Drs. K. Hsu, S. Dianat, and A. Mathew,
for their insights and guidance throughout this thesis work. I would also like to thank Dr.
L. K. Mestha, our industry contact at Xerox, for his insight and guidance and for
overseeing the research conducted during this project. This research was supported by a
grant from the Center for Electronic Imaging Systems (CEIS), the New York State Office
of Science, Technology and Academic Research (NYSTAR) and Xerox Corporation. A
66
big thanks to all of the aforementioned for everything they have taught me about color
correction and the printing process and industry. Also thank you for the opportunity to
work and study in such a challenging and timely area of research.
Lastly, I would like to thank Kishore Narreddy, a graduate student in the Electrical
Engineering department at RIT, for his collaborative efforts in conducting this research. I
would like to specifically thank Kishore for sharing his knowledge of gamut mapping and
for providing the gamut mapping code. Also thanks to Kishore for his help in verifying
the results of the software simulations.
67
8 References
[1] S. A. Dianat, L. K. Mestha, D. E. Viassolo, and Y. R. Wang. A practical algorithm
for the inversion of an experimental input-output colormap for color correction.
October 2001.
[2] R. Balasubramanian, M. S. Maltz. Refinement ofprinter transformations using
weighted regression. SPIE Proceedings Vol. 2658, 1996.
[3] D. Shepard. A two-dimensional interpolation function for irregularly-spaced data.
Proc. ACMNat. Conf, Pages 517-524, 1968.
[4] M. W. Martin, K. K Narredy. Analysis ofPrinter Inversion Models. April 2002.
UNPUBLISHED.
[5] H. R. Kang. Color Technologyfor Electronic ImagingDevices. SPIE Press, 1997.
[6] R. Groff, P. Khargonekar, D. Koditschek, T. Thieret, and L. K. Mestha. Modeling
and Control ofXerographic Processes. Proceedings of the 38th Conference on
Decision and Control. December 1999.
[7] M. C. Stone, W. B. Cowan, J. C. Beatty. Color GamutMapping and the Printing of
Digital Color Images. ACM Transactions on Graphics, Vol. 7, No. 4. October 1988.
[8] R. Groff, P. Khargonekar, D. Koditschek, and T. Thieret. Representation ofColor
Space Transformations for Effective Calibration and Control. IS&TNIP16: 2000
International Conference on Digital Printing Technologies, 2000.
[9] Po-Chieh Hung. Colorimetric calibration in electronic imaging devices using a look
up-table model and interpolations. Journal ofElectronic Imaging, Vol 2(1). January
1993.
[10] L. K. Mestha, Y. R. Wang, M. A. Scheuer, and T. E. Thieret. AMultilevel Modular
Control Architecture for Image Reproduction. Proceedings ofthe 1998 IEEE
International Conference on ControlApplications . September 1998.
[1 1] E. K. Chong and S. H. Zak. An Introduction to Optimization. JohnWiley & Sons
Inc., New York, 1996.
68
9 Appendices
9. 1 Software Simulation Accuracy Results
In this appendix, the accuracy results of each algorithm for varying parameters are
presented. The tables show the accuracy statistics; from them the optimal parameter
values are determined. The algorithms and their parameters are discussed in detail in
Section 5.2.
69
9.1.1 Shepard's Interpolation
p MU
Mean
Delta E Std Dev
Mean+2
*Std Min Max
1 33.55 10.97 55.50 2.24 78.17
2 17.61 6.89 31.38 0.64 47.32
3 5.20 3.00 11.20 0.15 19.76
4 1.97 0.93 3.82 0.12 7.63
5 1.72 0.60 2.92 0.15 3.64
6 1.82 0.65 3.13 0.15 4.04
2 1 33.30 10.97 55.24 2.18 77.57
2 2 17.36 6.97 31.29 0.71 49.18
2 3 5.00 2.91 10.82 0.15 18.31
2 4 1.86 0.80 3.47 0.15 6.55
2 5 1.64 0.54 2.73 0.15 3.81
2 6 1.76 0.60 2.97 0.13 3.27
3 1 33.32 10.98 55.28 2.16 77.19
3 2 17.56 7.06 31.68 0.70 50.06
3 3 5.17 3.02 11.21 0.15 18.01
3 4 1.90 0.84 3.57 0.15 6.54
3 5 1.65 0.55 2.74 0.15 4.01
3 6 1.76 0.60 2.97 0.12 3.68
4 1 33.11 10.92 54.95 2.14 76.59
4 2 17.18 6.91 31.14 0.60 49.22
4 3 4.95 2.92 10.79 0.15 17.25
4 4 1.87 0.81 3.49 0.15 6.23
4 5 1.66 0.56 2.78 0.15 4.06
4 6 1.78 0.62 3.02 0.15 3.86
5 1 33.07 10.91 54.88 2.13 76.35
5 2 17.14 6.98 31.10 0.59 49.08
5 3 4.95 10.81 10.81 0.15 17.54
5 4 1.88 0.82 3.52 0.15 6.49
5 5 1.67 0.57 2.81 0.13 4.11
5 6 1.79 0.63 3.05 0.15 3.97
6 1 33.10 10.91 54.92 2.12 76.27
6 2 17.25 7.01 31.26 0.61 49.25
6 3 5.03 2.98 10.99 0.15 18.01
6 4 1.90 0.84 3.58 0.15 6.76
6 5 1.67 0.58 2.83 0.15 4.15
6 6 1.79 0.64 3.07 0.15 4.04
70
9.1.2 Moving Matrix
MU Mean Std Dev
Mean
+2*StdDev Min Max
1 17.45 9.65 36.75 0.29 52.08
2 10.36 6.27 22.9 0.05 36
3 3.87 2.68 9.23 0.06 15.66
4 1.61 2.53 6.66 0.03 77.74
5 1.28 2.29 5.86 0.02 54.88
6 1.85 5.72 13.28 0.03 77.74
7 3.1 9.24 21.58 0.01 88.25
9.1.3 ICI
EP = 0.5 (Constant)
KMAX = 50 (Constant)
DEL MU Mean Std Dev
Mean
+2*StdDev Min Max
5 1 0.49 0.14 0.76 0.05 2.01
5 2 0.45 0.10 0.66 0.05 1.85
5 3 0.43 0.11 0.65 0.05 1.68
5 4 0.43 0.43 1.30 0.05 14.57
5 5 0.57 2.00 4.56 0.03 47.42
5 8 4.23 12.93 30.10 0.05 91.03
5 10 10.37 20.65 51.66 0.05 104.76
7 1 0.50 0.13 0.75 0.05 1.97
7 2 0.46 0.09 0.65 0.05 1.68
7 3 0.44 0.10 0.64 0.05 1.68
7 4 0.43 0.40 1.23 0.05 15.92
7 5 0.58 1.98 4.54 0.04 47.11
7 8 4.65 13.56 31.77 0.04 92.51
7 10 11.06 21.21 53.49 0.02 103.13
10 1 0.50 0.13 0.75 0.08 1.95
10 2 0.46 0.09 0.64 0.07 1.68
10 3 0.44 0.10 0.63 0.08 1.68
10 4 0.45 0.50 1.45 0.06 14.57
10 5 0.59 2.00 4.59 0.03 43.57
10 8 4.48 13.31 31.09 0.05 92.40
10 10 11.23 21.28 53.79 0.02 104.69
12 1 0.50 0.13 0.77 0.10 1.97
12 2 0.46 0.09 0.64 0.10 1.68
12 3 0.44 0.10 0.63 0.10 1.68
12 4 0.44 0.41 1.26 0.07 13.19
12 5 0.57 1.84 4.26 0.04 41.53
12 8 4.21 12.94 30.08 0.02 89.18
12 10 10.58 20.71 52.00 0.03 101.85
71
MU = 4
DEL = 10
KMAX EP Mean Std Dev
Mean
+2*StdDev Min Max
25 0.5 0.45 0.46 1.37 0.06 13.73
25 0.6 0.52 0.46 1.44 0.06 13.73
25 0.7 0.59 0.47 1.52 0.06 13.73
25 0.8 0.65 0.47 1.59 0.08 13.73
25 1 0.78 0.48 1.74 0.08 13.73
25 1.3 0.94 0.51 1.97 0.08 13.73
25 1.5 1.05 0.54 2.13 0.08 13.73
50 0.5 0.45 0.50 1.45 0.06 14.57
50 0.6 0.52 0.50 1.52 0.06 14.57
50 0.7 0.59 0.50 1.60 0.06 14.57
50 0.8 0.65 0.51 1.67 0.08 14.57
50 1 0.78 0.52 1.82 0.08 14.57
50 1.3 0.95 0.55 2.04 0.08 14.57
50 1.5 1.05 0.57 2.20 0.08 14.57
75 0.5 0.44 0.46 1.37 0.06 13.90
75 0.6 0.51 0.46 1.44 0.06 13.90
75 0.7 0.59 0.46 1.52 0.06 13.90
75 0.8 0.65 0.47 1.59 0.08 13.90
75 1 0.78 0.48 1.74 0.08 13.90
75 1.3 0.94 0.51 1.97 0.08 13.90
75 1.5 1.05 0.54 2.13 0.08 13.90
100 0.5 0.45 0.50 1.45 0.06 14.57
100 0.6 0.52 0.50 1.52 0.06 14.57
100 0.7 0.59 0.50 1.60 0.06 14.57
100 0.8 0.65 0.51 1.67 0.08 14.57
100 1 0.78 0.52 1.82 0.08 14.57
100 1.3 0.95 0.55 2.04 0.08 14.57
100 1.5 1.05 0.57 2.20 0.08 14.57
72
9.2 VHDL Code
9.2.1 tri_top.vhd
-- Description: Trilinear Module (TOP)
-- Filename: tri_top.vhd
library IEEE,DWARE;
use IEEE. numeric_std. all;
use DWARE . DW_Foundat ion .all;
use IEEE. STD_LOGIC_1164 .all;
entity TRI_TOP is
port (CLK, START: in STD_LOGIC;
XO, XI, X2: in INTEGER;
DONE: out STD_LOGIC;
ERROR: out STD_LOGIC;
ZO, Zl, Z2: out INTEGER;
-- External Memory Control Signals
MEMIN: in INTEGER;
MEMADDR: out INTEGER) ;
end TRIJTOP;
architecture STRUCTURAL of TRIJTOP is
component TRI_ADDR_BLOCK
port (INI, IN2, IN3, IN4 , IN5, IN6 , IN7 : in INTEGER;
MEMADDR: out INTEGER);
end component;
component TRI_CX_BLOCK
port (SUB1A, SUB1B, SUB2A, SUB2B, SUB3A, SUB3B, ADD1A, ADD1B : in
INTEGER;
SUB1_RESULT, SUB2_RESULT, SUB3_RESULT, ADD1_RESULT: out
INTEGER) ;
end component
component TRI_TERM_BLOCK
port (CO, CI, C2, C3, C4, C5, C6, CI, DX, DY, DZ : in INTEGER;
RESULT: out INTEGER);
end component ;
component TRI_CTRL
port (CLK, START: in STD_LOGIC;
XO, XI, X2: in INTEGER;
ZO, Zl, Z2: out INTEGER;
DONE: out STD_LOGIC;
ERROR: out STD_LOGIC;
-- I/O Signals Trilinear DP
TRI_ADDR_DP_IN1 , TRI_ADDR_DP_IN2 , TRI_ADDR_DP_IN3 ,
73
INTEGER;
INTEGER;
TRI_ADDR_DP_IN4 , TRI_ADDR_DP_IN5 , TRI_ADDR_DP_IN6 ,
TRI_ADDR_DP_IN7 : out INTEGER;
-- I/O Signals Memory
MEMIN: in INTEGER;
-- I/O Signals to CX DP
SUB1A, SUB1B, SUB2A, SUB2B, SUB3A, SUB3B, ADD1A, ADD1B: out
SUB1_RESULT, SUB2_RESULT, SUB3 RESULT, ADD1 RESULT: in
-- I/O Signals to TERM DP
T_C0, T_C1, T_C2, T_C3, T_C4 , T_C5, T_C6 , T_C7,
T_DX, T_DY, T_DZ: out INTEGER;
T_RESULT: in INTEGER);
end component;
-- Internal Signals for TRI_ADDR_BLOCK
signal INT_IN1, INT_IN2 , INT_IN3 , INT_IN4 ,
INT_IN5, INT_IN6, INT_IN7 : INTEGER;
-- Internal Signals for TRI_CX_BL0CK
signal INT_SUB1A, INT_SUB1B, INT_SUB2A, INT_SUB2B, INT_SUB3A,
INT_SUB3B,
INT_ADD1A, INT_ADD1B, INT_SUB1_RESULT, INT_SUB2_RESULT,
INT_SUB3_RESULT, INT_ADD1_RESULT : INTEGER;
-- Internal Signals for TRI_TERM_BL0CK
signal INT_CO, INT_C1, INT_C2 , INT_C3 , INT_C4 , INT_C5 , INT_C6 ,
INT_C7 ,
INT_DX, INT_DY, INT_DZ, INT_RESULT: INTEGER;
begin
ADDR_BLK : TRI_ADDR_BLOCK
port map (IN1=>INT_IN1,
IN2=>INT_IN2,
IN3=>INT_IN3,
IN4=>INT_IN4,
IN5=>INT_IN5,
IN6=>INT_IN6,
IN7=>INT_IN7,
MEMADDR=>MEMADDR) ;
CX_BLK: TRI_CX_BL0CK
port map (SUB1A=>INT_SUB1A,
SUB1B=>INT_SUB1B,
SUB2A=>INT_SUB2A,
SUB2B=>INT_SUB2B,
SUB3A=>INT_SUB3A,
SUB3B=>INT_SUB3B,
ADD1A=>INT__ADD1A,
ADD1B=>INT_ADD1B,
SUB1_RESULT=>INT_SUB1_RESULT,
SUB2_RESULT= > INT_SUB2_RESULT ,
SUB3_RESULT= > INT_SUB3_RESULT ,
ADD1_RESULT=>INT_ADD1_RESULT) ;
TERM_BLK: TRI_TERM_BL0CK
74
port map (C0=>INT_C0,
C1=>INT_C1,
C2=>INT_C2,
C3=>INT_C3,
C4=>INT_C4,
C5=>INT_C5,
C6=>INT_CS,
C7=>INT_C7,
DX=>INT_DX,
DY=>INT_DY,
DZ=>INT_DZ,
RESULT=>INT_RESULT) ;
CTRL_BLK: TRI_CTRL
port map (CLK=>CLK,
START=>START,
X0=>X0,
X1=>X1,
X2=>X2,
ZO=>ZO,
Z1=>Z1,
Z2=>Z2,
D0NE=>D0NE,
ERROR=>ERROR,
-- I/O Signals Trilinear DP
TRI_ADDR_DP_IN1=>INT_IN1 ,
TRI_ADDR_DP_IN2 =>INT_IN2 ,
TRI_ADDR_DP_IN3 =>INT_IN3 ,
TRI_ADDR_DP_IN4 =>INT_IN4 ,
TRI_ADDR_DP_IN5=>INT_IN5 ,
TRI_ADDR_DP_IN6=>INT_IN6 ,
TRI_ADDR_DP_IN7=>INT_IN7 ,
-- I/O Signals Memory
MEMIN=>MEMIN,
-- I/O Signals to CX DP
SUB1A=>INT_SUB1A,
SUB1B=>INT_SUB1B,
SUB2A=>INT_SUB2A,
SUB2B=>INT_SUB2B,
SUB3A=>INT_SUB3A,
SUB3B=>INT_SUB3B,
ADD1A= > INT_ADD1A ,
ADD1B=>INT_ADD1B,
SUB1_RESULT=>INT_SUB1_RESULT,
SUB2_RESULT= > INT_SUB2_RESULT ,
SUB3_RESULT= > INT_SUB3_RESULT ,
ADD1_RESULT=>INT_ADD1_RESULT,
-- I/O Signals to TERM DP
T_C0=>INT_C0,
T_C1=>INT_C1,
T_C2=>INT_C2,
T_C3=>INT_C3,
T_C4=>INT C4,
75
T_C5=>INT_CB,
T_C6=>INT_C6,
T_C7=>INT_C7,
T_DX=>INT_DX,
T_DY=>INT_DY,
T_DZ=>INT_DZ,
T_RESULT=>INT_RESULT) ;
end STRUCTURAL;
76
9.2.2 tri ctrl.vhd
-- Description: Trilinear Control
-- Filename: tri_ctrl.vhd
library IEEE,DWARE;
use IEEE. numeric_std. all;
use DWARE.DW_Foundation.all;
-
-library IEEE;
use IEEE.STD_L0GIC_1164.all;
entity TRI_CTRL is
port (CLK, START: in STD_L0GIC;
XO, XI, X2: in INTEGER;
ZO, Zl, Z2: out INTEGER;
DONE: out STD_L0GIC;
ERROR: out STD_L0GIC;
-- I/O Signals Trilinear DP
TRI_ADDR_DP_IN1 , TRI_ADDR_DP_IN2 , TRI_ADDR_DP_IN3 ,
TRI_ADDR_DP_IN4 , TRI_ADDR_DP_IN5 , TRI_ADDR_DP_IN6 ,
TRI_ADDR_DP_IN7 : out INTEGER;
-- I/O Signals Memory
MEMIN: in INTEGER;
-- I/O Signals to CX DP
SUB1A, SUB1B, SUB2A, SUB2B, SUB3A, SUB3B, ADD1A, ADD1B : out
INTEGER;
SUB1_RESULT, SUB2_RESULT, SUB3_RESULT, ADD1_RESULT : in INTEGER;
-- I/O Signals to TERM DP
T_CO, T_C1, T_C2, T_C3, T_C4 , T_C5 , T_C6, T_C7 ,
T_DX, T_DY, T_DZ: out INTEGER;
T_RESULT: in INTEGER);
end TRI_CTRL;
architecture BEHAVIORAL of TRI_CTRL is
begin
process
variable I: INTEGER range 0 to 13;
variable x_lo_pos_temp, y_lo_pos_temp, z_lo_pos_temp,
x_lo_pos, y_lo_pos, z_lo_pos,
x_hi_pos, y_hi_pos, z_hi_pos : INTEGER range 0 to 12;
variable x_lo_temp, y_lo_temp, z_lo_temp,
x_lo, x_hi, y_lo, y_hi, z_lo, z_hi : INTEGER;
variable found_x, found_y, found_z : BOOLEAN;
variable POOI, POIO, P011, PIOI: INTEGER;
variable CO, CI, C2, C3, C4, C5, C6, C7 : INTEGER;
variable C4_TEMP, C5_TEMP, C6_TEMP, C7JTEMP1, C7_TEMP2 : INTEGER;
variable RESULT1, RESULT2 , RESULT3 : INTEGER;
77
begin
loop
TRI_ADDR_DP_IN1 <= 0 ;
TRI_ADDR_DP_IN2 <= 0
TRI_ADDR_DP_IN3 <= 0
TRI_ADDR_DP_IN4 <= 0
TRI_ADDR_DP_IN5 <= 0
TRI_ADDR_DP_IN6 <= 0
TRI_ADDR_DP_IN7 <= 0
ERROR <= ' 0 ' ;
wait until CLK' event and CLK= ' l ' ;
DONE <= ' 0 ' ;
ERROR 0';
if (START = '1') then
found x : = FALSE ;
found_y := FALSE;
found z : = FALSE ;
I:=0;
TRI_ADDR_DP_IN1 <=
TRI_ADDR_DP_IN2 <= 6
TRI_ADDR_DP_IN3 <= I
TRI_ADDR_DP_IN4 <= 0
TRI_ADDR_DP_IN5 <= 0
TRI_ADDR_DP_IN6 <= 0
TRI ADDR DP IN7 <= 0
2;
wait until CLK' event and CLK= ' 1
I: =1+1;
TRI_ADDR_DP_IN1 <= 2,
TRI_ADDR_DP_IN2 <= 6;
TRI_ADDR_DP_IN3 <= I;
wait until CLK 'event and CLK=
x_lo_pos_temp := 1-1;
x_lo_temp := MEMIN;
y_lo_pos_temp := 1-1;
y_lo_temp := MEMIN;
z_lo_pos_temp := 1-1;
z_lo_temp := MEMIN;
I:=I+1;
TRI_ADDR_DP_IN1 <= 2;
TRI_ADDR_DP_IN2 <= 6;
TRI_ADDR_DP_IN3 <= I;
wait until CLK' event and CLK=
while I <= 13 loop
if XO <= MEMIN and found_x = FALSE then
x_lo_pos := x_lo_pos_temp;
x_lo := x_lo_temp;
X_hi_pos := 1-1;
X_hi := MEMIN;
Read Mem 2
78
fOUnd_X : = TRUE ;
else
x_lo_pos_temp := 1-1;
x_lo_temp := MEMIN;
end if;
if XI <= MEMIN and found_y = FALSE then
y_lo_pos := y_lo_pos_temp;
y_lo := y_lo_temp;
y_hi_pos := i-i;
y_hi := MEMIN;
found_y : = TRUE ;
else
y_lo_pos_temp : = I - 1 ;
y_lo_temp := MEMIN;
end if;
if X2 <= MEMIN and found_z = FALSE then
z_lo_pos := z_lo_pos_temp;
z_lo := z_lo_temp;
Z_hi_pos := 1-1;
Z_hi := MEMIN;
fOund_Z : = TRUE ;
else
z_lo_pos_temp := 1-1;
z_lo_temp := MEMIN;
end i f ;
I:=I+1;
TRI_ADDR_DP_IN1 <= 2;
TRI_ADDR_DP_IN2 <= 6;
TRI_ADDR_DP_IN3 <= I,
wait until CLK' event and CLK= ' 1 '
end loop;
if found_x /= TRUE or found_y /= TRUE or found_z /= TRUE then
ERROR <= ' 1 ' ;
else
I : =3 ;
ile I < 6 loop
-- Get POOO
TRI ADDR_DP_IN1 <= I;
TRI_ADDR_DP_IN2 < = 1014;
TRI_ADDR_DP_IN3 < = X lo_pOS;
TRI_ADDR_DP_IN4 < = 78;
TRI_ADDR_DP_IN5 < = y
lo_pos,-
TRI ADDR_DP_IN6 < = 6;
TRI ADDR_DP_IN7 < = Z lo_pOS;
SUB1A <= XO;
SUB1B <= x_lo;
SUB2A <= XI;
SUB2B <= y_lo;
SUB3A <= X2;
SUB3B <= z lo;
wait until CLK' event and CLK=
' 1 ' ;
-- Get P100
TRI_ADDR_DP_IN1 <= I;
TRI ADDR DP IN2 <= 1014;
79
TRI_ADDR_DP_IN3 <= x_hi_pos;
TRI_ADDR_DP_IN4 <= 78;
TRI_ADDR_DP_IN5 <= y_lo_pos ;
TRI_ADDR_DP_IN6 <= 6 ;
TRI_ADDR_DP_IN7 <= z_lo_pos;
T_DX <= SUB1_RESULT;
T_DY <= SUB2_RESULT;
T_DZ <= SUB3_RESULT;
wait until CLK' event and CLK= ' 1
-- POOO AvailaLble
CO := MEMIN; -
SUB1B <= MEMIN;
-- Get POIO
TRI_ADDR_DP_IN1 < = I;
TRI_ADDR_DP_IN2 < = 1014;
TRI_ADDR_DP_IN3 < = X lo_pOS;
TRI_ADDR_DP_IN4 < = 78;
TRI_ADDR_DP_IN5 < = y hi_pos;
TRI_ADDR_DP_IN6 < = 6;
TRI_ADDR_DP_IN7 < = Z_lo_poS ;
POOO
wait until CLK' event and CLK= ' 1 ' ;
-- P100 Available
SUB1A <= MEMIN; -- P100
-- Get P001
TRI_ADDR_DP_IN1 <= I;
TRI_ADDR_DP_IN2 <= 1014;
TRI_ADDR_DP_IN3 <= x_lo_poS;
TRI_ADDR_DP_IN4 <= 78;
TRI_ADDR_DP_IN5 <= y_lo_pos;
TRI_ADDR_DP_IN6 <= 6;
TRI_ADDR_DP_IN7 <= z_hi_pos;
wait until
CLK' event and CLK=
' 1 '
,
-- CI Available
CI := SUB1_RESULT;
-- P010 Available
P010 := MEMIN;
SUB1A <= P010;
-- Get P110
TRI_ADDR_DP_IN1 <= I ;
TRI_ADDR_DP_IN2 <= 1014;
TRI_ADDR_DP_IN3 <= x_hi_poS;
TRI_ADDR_DP_IN4 <= 78;
TRI_ADDR_DP_IN5 <= y_hi_pos;
TRI_ADDR_DP_IN6 <= 6 ;
TRI_ADDR_DP_IN7 <= z_lo_pos;
wait until
CLK' event and CLK=
' 1 '
-- C2 Available
C2 := SUB1_RESULT;
-- POOI Available
P001 := MEMIN;
SUB1A <= P001;
-- Get P101
TRI ADDR DP INI <= I;
80
<= x_hi_pos;
<= 78;
TRI_ADDR_DP_IN2 <= 1014;
TRI_ADDR_DP_IN3
TRI_ADDR_DP_IN4
TRI_ADDR_DP_IN5 <= y_lo_pos ,
TRI_ADDR_DP_IN6 <= 6;
TRI_ADDR_DP_IN7 <= z_hi_pos;
wait until CLK 'event and CLK= ' 1 '
-- C3 Availabl e
C3 := SUB1_RESULT;
-- P110 Avaiable
SUB1A <= MEMIN; -- P
SUB1B <= POIO;
-- Get P011
TRI_ADDR_DP_IN1 < = I;
TRI_ADDR_DP IN2 < = 1014;
TRI_ADDR_DP_IN3 < = x lo_pos
TRI_ADDR_DP_IN4 < = 78;
TRI_ADDR_DP_IN5 < = y hi_pos
TRI_ADDR_DP_IN6 < = 6;
TRI_ADDR_DP_IN7 < = z hi_pos
110
wait until CLK1 event and CLK= ' 1 '
C4_TEMP := SUB1_RESULT;
SUB1A <= C4JTEMP;
SUB1B <= CI;
-- P101 Available
P101 := MEMIN
SUB2A <= P101
SUB2B <= P001
-- Get Pill
TRI_ADDR_DP_IN1 <= I;
TRI_ADDR_DP_IN2 <= 1014;
TRI_ADDR_DP_IN3 <= x_hi_pos
TRI_ADDR_DP_IN4 <= 78;
TRI_ADDR_DP_IN5 <= y_hi_pos
TRI_ADDR_DP_IN6 <= 6;
TRI_ADDR_DP_IN7 <= z_hi_pos
wait until
CLK' event and CLK=
' 1
C4 := SUB1_RESULT;
C5_TEMP := SUB2_RESULT;
SUB1A <= C5_TEMP;
SUB1B <= CI;
-- P011 Available
P011 := MEMIN;
SUB2A <= P011;
SUB2B <= P001;
wait until
CLK' event and CLK=
C5 := SUB1_RESULT;
C6_TEMP := SUB2_RESULT;
SUB1A <= C6_TEMP;
SUB1B <= C2;
-- Pill Available
SUB2A <= MEMIN;
SUB2B <= P011;
Pill
81
SUB3A <= POOI;
SUB3B <= C4;
wait until CLK' event and CLK= ' 1 ' ;
C6 := SUB1_RESULT;
C7_TEMP1 := SUB2_RESULT;
C7_TEMP2 := SUB3_RESULT;
SUB1A <= C7JTEMP2;
SUB1B <= PIOI;
wait until CLK' event and CLK= ' 1 ' ;
ADD1A <= C7JTEMP1;
ADD1B <= SUB1_RESULT;
wait until CLK 'event and CLK= ' 1 ' ;
C7 := ADD1 RESULT;
T_C0 < = CO
T_C1 < = CI
T_C2 < = C2
T_C3 < = C3
T_C4 < = C4
T_C5 < = C5
T_C6 <= C6
T C7 < = C7
if (1=4) then
RESULT1 := T_RESULT;
elsif (1=5) then
RESULT2 := T_RESULT;
end i f ;
I:=I+1;
end loop,-
-- Required wait cycles for term_block
--to finish computation of last
-- component interpolation.
wait until CLK 'event
wait until CLK 'event
wait until CLK 'event
wait until CLK' event
wait until CLK 'event
wait until CLK 'event
wait until
CLK' event
wait until
CLK' event
wait until CLK 'event
wait until
CLK' event
wait until CLK 'event
wait until CLK 'event
ZO <= RESULT1;
Zl <= RESULT2;
Z2 <= T_RESULT;
DONE <= ' 1 ' ;
end if;
end if;
82
and CLK = '1
and CLK = '1
and CLK = '1
and CLK = '1
and CLK = ' 1
and CLK = '1
and CLK = '1
and CLK = '1
and CLK = '1
and CLK = '1
and CLK = '1
and CLK = '1
end loop,-
end process,-
end BEHAVIORAL;
83
9.2.3 tri ex block.vhd
-- Description: Trilinear CX Block
-- Filename: tri_cx_block.vhd
library IEEE,DWARE;
use IEEE .numeric_std. all ;
use DWARE . DW_Foundation . all ;
--library IEEE;
--use IEEE. STD_L0GIC_1164. all;
entity TRI_CX_BL0CK is
port (SUB1A, SUB1B, SUB2A, SUB2B, SUB3A, SUB3B, ADD1A, ADD1B : in
INTEGER;
SUB1_RESULT, SUB2_RESULT, SUB3_RESULT, ADD1_RESULT : out INTEGER)
end TRI_CX_BL0CK;
architecture BEHAVIORAL of TRI_CX_BLOCK is
begin
SUB1_RESULT <= SUB1A - SUB1B
SUB2_RESULT <= SUB2A - SUB2B
SUB3_RESULT <= SUB3A - SUB3B
ADD1_RESULT <= ADD1A + ADD1B
end BEHAVIORAL;
84
9.2.4 tri term block.vhd
-- Description: Term Block
-- Filename: tri_term_block. vhd
-- Revisions: 4/13/02 Creation.
library IEEE, DWARE;
--library IEEE;
use IEEE. NUMERICJSTD. all;
use IEEE. STD_LOGIC_1164. all;
use DWARE.DW_Foundat ion. all,
entity TRI_TERM_BLOCK is
port (CO, CI, C2, C3, C4 , C5 , C6, C7, DX, DY, DZ : in INTEGER;
RESULT: out INTEGER);
end TRI_TERM_BLOCK;
architecture BEHAVIORAL of TRI_TERM_BLOCK is
signal TERM1, TERM2 , TERM3 , TERM4 , TERM5 , TERM6 , TERM7 : INTEGER;
begin
TERM1 <= TO_INTEGER(TO_SIGNED( (C1*DX*47) , 32) /
TO_SIGNED (100000, 32) ) ;
TERM2 <= TO_INTEGER(TO_SIGNED( (C2*DY*47) , 32) /
TO_SIGNED (100000, 32) ) ;
TERM3 <= TO_INTEGER(TO_SIGNED( (C3*DZ*47) , 32) /
TO_SIGNED (100000, 32) ) ;
TERM4 <= TO_INTEGER(TO_SIGNED( (TO_INTEGER(TO_SIGNED( (C4*DX*22) ,32) /
TCJSIGNED (10000, 32) ) * DY),32) / TCJSIGNED (10000, 32) ) ;
TERM5 <= TO_INTEGER (TCJSIGNED ( (TO_INTEGER (TCJSIGNED ( (C5*DX*22) ,32) /
TO_SIGNED(10000,32) ) *DZ),32) / TCJSIGNED (10000 , 32) ) ;
TERM6 <= TO_INTEGER(TO_SIGNED( (TCJINTEGER (TCJSIGNED ( (C6*DY*22) ,32) /
TCJSIGNED (10000, 32) ) * DZ),32) / TCJSIGNED (10000 , 32 )) ;
TERM7 <=
TCJINTEGER (TCJSIGNED ( (TCJINTEGER (TCJSIGNED ( (TCJINTEGER (TCJSIGNED ( (C7*DX)
,32) / TCJSIGNED (10000, 32 ) ) * DY),32) / TCJSIGNED (100 , 32) ) * DZ),32) /
TCJSIGNED (10000, 32) ) ;
RESULT <= CO + TERM1 + TERM2 + TERM3 + TERM4 + TERM5 + TERM6 + TERM7 ;
end BEHAVIORAL;
85
9.2.5 tri addr block.vhd
-- Description: Trilinear Address Calculator
-- Filename: tri_addr_block. vhd
library IEEE, DWARE;
use IEEE. numeric_std. all;
use DWARE. DW_Foundat ion. all;
--library IEEE;
--use IEEE.STD_L0GIC_1164.all;
entity TRI_ADDR_BLOCK is
port (INI, IN2, IN3, IN4 , IN5 , IN6 , IN7 : in INTEGER;
MEMADDR: out INTEGER) ;
end TRI_ADDR_BLOCK;
architecture BEHAVIORAL of TRI_ADDR_BLOCK is
begin
MEMADDR <= (INI + (IN2*IN3)) + ((IN4*IN5) + (IN6*IN7))
end BEHAVIORAL;
86
9.2.6 tri_tb_top.vhd
library IEEE;
use IEEE. STD_LOGIC_1164. all;
use STD.TEXTIO.ALL;
entity TRI_TB_TOP is
end TRIJTBJTOP;
architecture BEHAVIORAL of TRI TB TOP is
component TABLEMEM
port (CLK: in STD_LOGIC;
WR: in STD_LOGIC;
ADDR: in NATURAL;
DIN: in INTEGER;
DOUT: out INTEGER);
end component;
Read: WR=0, Write: WR=1
component TRIJTOP
port (CLK, START: in STD_LOGIC;
XO, XI, X2 : in INTEGER;
DONE: out STD_LOGIC;
ZO, Zl, Z2: out INTEGER;
-- External Memory Control Signals
MEMIN: in INTEGER;
MEMADDR: out INTEGER);
end component;
signal INT_CLK: STD_LOGIC := ' 0 ' ;
Signal INT_START: STD_LOGIC := ' 0 ' ;
signal INT_DONE : STD_LOGIC;
signal INT_X0, INT_X1, INT_X2, INT_Z0, INT_Z1, INT_Z2 : INTEGER,
Signal INT_MEM_DATA, INT_MEM_ADDR : INTEGER;
signal INT_WR: STD_LOGIC; -- Not Used
signal INT_DIN: INTEGER; -- Not Used
begin
INT_CLK <= not INT_CLK after 2 0 ns ;
TEST: process
file IN_DATA_FILE: text is in "ici . CMYU. out " ;
file OUT_DATA_FILE : text is out " tri_top . out " ;
variable L: LINE;
variable XOTEMP, X1TEMP, X2TEMP: INTEGER;
variable ZOTEMP, Z1TEMP, Z2TEMP: INTEGER;
begin
while not endf ile (IN_DATA_FILE) loop
readline(IN_DATA_FILE, L) ;
read(L, XOTEMP) ;
read(L, X1TEMP) ;
read(L, X2TEMP) ;
87
INT_XO <= XOTEMP
INT_X1 <= X1TEMP
INT_X2 <= X2TEMP
INT_START <= ' 1'
wait until INT_DONE= ' 1 ' ;
Z0TEMP:= INT_ZO;
Z1TEMP:= INT_Z1;
Z2TEMP:= INT_Z2 ;
write (L, XOTEMP) ;
write (L, ' ' ) ;
write (L, X1TEMP) ;
write (L, ' ' ) ;
write (L, X2TEMP) ;
write (L, ' ' ) ;
write (L, ZOTEMP) ;
write (L, ' ) ;
write (L, Z1TEMP) ;
write (L, ' ' ) ;
write (L, Z2TEMP) ;
writeline(OUT_DATA_FILE, L) ;
end loop,-
wait;
end process;
MEM : TABLEMEM
port map (CLK=>INT_CLK, WR=>INT_WR, ADDR=>INT_MEM_ADDR,
DIN=>INT_DIN, DOUT=>INT_MEM_DATA) ;
DUT: TRI_TOP
port map (CLK=>INT_CLK, START=>INT_START,
X0=>INT_X0, X1=>INT_X1, X2=>INT_X2,
DONE=>INT_DONE, Z0=>INT_Z0, Z1=>INT_Z1, Z2=>INT_Z2
MEMIN=>INT_MEM_DATA, MEMADDR=>INT_MEM_ADDR) ;
end BEHAVIORAL;
88
9.2.7 ici_top.vhd
-- Description: ICI Module Top
-- Filename: ici_top.vhd
library IEEE;
use IEEE.STD_L0GIC_1164.all;
entity ICIJTOP is
port (CLK, START: in STCJLOGIC;
- - Input
XI, X2, X3 : in INTEGER;
- - Parameters
P1_L0_B0UND, P1_HI_B0UND,
P2_L0_B0UND, P2_HI_B0UND,
P3_LO_BOUND, P3_HI_B0UND,
P_SIZE, K_MAX, EP, MU,
DEL, INVDEL_0WD, INVDELJTWD : in INTEGER;
- - Output
DONE: out STCJLOGIC;
INVJOUTO, INV_0UT1, INV_OUT2 : out INTEGER;
-- Memory Module I/O Signals
MEMIN: in INTEGER;
MEM_MUX_SEL: out STCJLOGIC;
MEM_ADDR: out INTEGER;
-- Trilinear Interpolation Module I/O Signals
TRI_START: out STCJLOGIC;
TRI_D0NE: in STCJLOGIC;
TRI_X0, TRI_X1, TRI_X2 : out INTEGER;
TRI_Z0, TRI_Z1, TRI_Z2: in INTEGER);
end ICIJTOP;
architecture STRUCTURAL of ICIJTOP is
component ICI_CTRL
port (CLK, START: in STCJLOGIC;
XI, X2, X3: in INTEGER;
- - Input Parameters
PIJXJBOUND, P1_HI_B0UND,
P2_L0_B0UND, P2_HI_B0UND,
P3_L0_B0UND, P3_HI_B0UND,
P_SIZE, K_MAX, EP,
DEL , INVDELJDWD , INVDEL_TWD ,
MU: in INTEGER;
DONE: out STCJLOGIC;
INV_OUT0, INV_0UT1, INV_0UT2 : out INTEGER;
-- External Memory Signals
MEMIN: in INTEGER;
89
MEM_MUX_SEL: out STCJLOGIC;
-- Memory Address Calculator I/O Signals
ADDR_CALC_IN1, ADDR_CALC_IN2 , ADDR_CALC_IN3 : out INTEGER,
-- Distance Calculator I/O Signals
D_CALC_X0, D_CALC_X1, D_CALC_X2 ,
D_CALC_Y0, D_CALC_Y1, D_CALC_Y2 : out INTEGER;
D_CALC_IN: in INTEGER;
-- Trilinear Interp. Module I/O Signals
TRI_START: out STCJLOGIC;
TRI_DONE: in STCJLOGIC;
TRI_X0, TRI_X1, TRI_X2: out INTEGER;
TRI_Z0, TRI_Z1, TRI_Z2: in INTEGER;
ICI_DP_X0, ICI_DP_X1, ICI_DP_X2,
ICI_DP_Y0, ICI_DP_Y1, ICI_DP_Y2,
ICI_DP_GM0, ICI_DP_GM1, ICI_DP_GM2 ,
ICI_DP_MU, ICI_DP_INVDEL, ICI_DP_CURRVAL: out INTEGER;
ICI_DP_NEXT_VAL: in INTEGER);
end component;
component ADDR_CALC
port (INI, IN2, IN3: in INTEGER;
ADDR: out INTEGER);
end component;
component DIS_DP
port (XO, XI, X2 : in INTEGER;
YO, YI, Y2: in INTEGER;
DISOUT: out INTEGER);
end component ;
component ICI_DP
port (XO, XI, X2, YO, YI, Y2 , GMO , GM1, GM2 , MU, INVDEL,
CURRVAL: in INTEGER;
NEXTVAL: out INTEGER);
end component ;
-- ADDR_CALC internal signals
signal INT_ADDR_CALC_IN1 , INT_ADDR_CALC_IN2 , INT_ADDR_CALC_IN3 :
INTEGER;
-- DIS_DP internal signals
signal INT_D_CALC_XO, INT_D_CALC_X1 , INT_D_CALC_X2 ,
INT_D_CALC_YO , INT_D_CALC_Y1 , INT_D_CALC_Y2 ,
INT_D_CALC_IN: INTEGER;
-- ICI_DP internal signals
signal INT_ICI_DP_XO, INT_ICI_DP_X1 , INT_ICI_DP_X2 ,
INT_ICI_DP_YO, INT_ICI_DP_Y1, INT_ICI_DP_Y2 ,
INT_ICI_DP_GMO, INT_ICI_DP_GM1, INT_ICI_DP_GM2 ,
INT_ICI_DP_MU, INT_ICI_DP_INVDEL,
INT_ICI_DP_CURRVAL, INT_ICI_DP_NEXTVAL : INTEGER;
begin
CTRL_BLK: ICI_CTRL
port map (CLK=>CLK,
START=>START,
90
X1=>X1,
X2=>X2,
X3=>X3,
P1_L0_B0UND=>P1_L0_B0UND ,
P1_HI_B0UND=>P1_HI_B0UND ,
P2_L0_B0UND= >P2_L0_B0UND ,
P2_HI_B0UND=>P2_HI_B0UND ,
P3_L0_B0UND= >P3_L0_B0UND ,
P3_HI_B0UND= >P3_HI_B0UND ,
P_SIZE=>P_SIZE,
K_MAX=>K_MAX,
EP=>EP,
DEL=>DEL,
INVDEL_0WD= > INVDEL_0WD ,
INVDEL_TWD= > INVDELJTWD ,
MU=>MU,
DONE=>DONE,
INV_OUT0= >INV_OUT0 ,
INVJOUT1= > INV_OUTl ,
INV_OUT2 = > INV_OUT2 ,
MEMIN=>MEMIN,
MEM_MUX_SEL= >MEM_MUX_SEL ,
ADDR_CALC_IN1=>INT_ADDR_CALC_IN1 ,
ADDR_CALC_IN2=>INT_ADDR_CALC_IN2 ,
ADDR_CALC_IN3=>INT_ADDR_CALC_IN3 ,
D_CALC_XO=>INT_D_CALC_XO ,
D_CALC_X1= > INT_D_CALC_X1 ,
D_CALC_X2 = > INT_D_CALC_X2 ,
D_CALC_YO=>INT_D_CALC_YO ,
D_CALC_Y1=>INT_D_CALC_Y1 ,
D_CALC_Y2=>INT_D_CALC_Y2 ,
D_CALC_IN=>INT_D_CALC_IN,
TRI_START= >TRI_START ,
TRI_DONE=>TRI_DONE ,
TRI_X0=>TRI_X0,
TRI_X1=>TRI_X1,
TRI_X2=>TRI_X2,
TRI_Z0=>TRI_Z0,
TRI_Z1=>TRI_Z1,
TRI_Z2=>TRI_Z2,
ICI_DP_XO=>INT_ICI_DP_XO,
ICI_DP_X1=>INT_ICI_DP_X1,
ICI_DP_X2=>INT_ICI_DP_X2 ,
ICI_DP_YO=>INT_ICI_DP_YO,
ICI_DP_Y1=>INT_ICI_DP_Y1,
ICI_DP_Y2=>INT_ICI_DP_Y2 ,
ICI_DP_GMO=>INT_ICI_DP_GMO ,
ICI_DP_GM1=>INT_ICI_DP_GM1 ,
ICI_DP_GM2=>INT_ICI_DP_GM2 ,
ICI_DP_MU=>INT_ICI_DP_MU,
ICI_DP_INVDEL=>INT_ICI_DP_INVDEL,
IC I_DP_CURRVAL= > INT_ICI_DP_CURRVAL ,
ICI_DP_NEXT_VAL=>INT_ICI_DP_NEXTVAL) ;
ADDR_CALC_BLK: ADDR_CALC
port map (IN1=>INT_ADDR_CALC_IN1,
IN2=>INT_ADDR_CALC_IN2 ,
91
IN3=>INT_ADDR_CALC_IN3 ,
ADDR=>MEM_ADDR) ;
DIS_DP_BLK: DIS_DP
port map (X0=>INT_D_CALC_X0 ,
X1 = >INT_D_CALC_X1 ,
X2 = > INT_D_CALC_X2 ,
Y0=>INT_D_CALC_Y0 ,
Y1=>INT_D_CALC_Y1 ,
Y2 = > INT_D_CALC_Y2 ,
DISOUT=>INT_D_CALC_IN) ;
ICI_DP_BLK: ICI_DP
port map (X0=>INT_ICI_DP_X0,
X1=>INT_ICI_DP_X1,
X2=>INT_ICI_DP_X2 ,
Y0=>INT_ICI_DP_Y0,
Y1=>INT_ICI_DP_Y1,
Y2=>INT_ICI_DP_Y2 ,
GMO=>INT_ICI_DP_GMO,
GM1=>INT_ICI_DP_GM1 ,
GM2=>INT_ICI_DP_GM2 ,
MU=>INT_ICI_DP_MU,
INVDEL=>INT_ICI_DP_INVDEL,
CURRVAL=>INT_ICI_DP_CURRVAL,
NEXTVAL=>INT_ICI_DP_NEXTVAL)
end STRUCTURAL;
92
9.2.8 ici ctrl.vhd
-- Description: ICI Control Block
-- Filename: ici_ctrl.vhd
-- Revisions: 4/18/02 Creation.
library IEEE;
use IEEE.STD_LOGIC_1164.all;
entity ICI_CTRL is
port (CLK, START: in STCJLOGIC;
XI, X2, X3 : in INTEGER;
- - Input Parameters
P1_L0_B0UND, P1_HI_B0UND,
P2_LO_BOUND, P2_HI_B0UND,
P3_L0_B0UND, P3_HI_B0UND,
P_SIZE,
K_MAX,
EP,
DEL, INVDEL_0WD, INVDEL_TWD,
MU: in INTEGER;
DONE: out STCJLOGIC;
INV_OUT0, INV_OUTl, INV_0UT2 : out INTEGER;
-- External Memory Signals
MEMIN: in INTEGER;
MEM_MUX_SEL: out STCJLOGIC;
-- Memory Address Calculator I/O Signals
ADDR_CALC_IN1 , ADDR_CALC_IN2 , ADDR_CALC_IN3 : out INTEGER,
-- Distance Calculator I/O Signals
D_CALC_X0 , D_CALC_X1 , D_CALC_X2 ,
D_CALC_Y0, D_CALC_Y1, D_CALC_Y2 : out INTEGER;
D_CALC_IN: in INTEGER;
-- Trilinear Interp. Module I/O Signals
TRI_START: out STCJLOGIC;
TRI_D0NE: in STCJLOGIC;
TRI_X0, TRI_X1, TRI_X2: out INTEGER;
TRI_Z0, TRI_Z1, TRI_Z2: in INTEGER;
ICI_DP_X0, ICI_DP_X1, ICI_DP_X2,
ICI_DP_Y0, ICI_DP_Y1, ICI_DP_Y2,
ICI_DP_GM0, ICI_DP_GM1, ICI_DP_GM2 ,
ICI_DP_MU, ICI_DP_INVDEL, ICI_DP_CURRVAL : out INTEGER;
ICI_DP_NEXT_VAL: in INTEGER) ;
end ICI_CTRL;
architecture BEHAVIORAL of ICI_CTRL is
signal INT_ADD1_RESULT, INT_ADD1A, INT_ADD1B: INTEGER;
93
begin
process
variable I, J, K: INTEGER;
type D_ARRAY is array (NATURAL range 0 to 3)
type I_ARRAY is array (NATURAL range 0 to 3)
variable CLOSED: D_ARRAY;
variable CLOSEIDX: I_ARRAY;
Of INTEGER;
Of NATURAL;
variable TEMP_C, TEMP_M, TEMP_Y,
TEMP_C_I, TEMP_M_I, TEMP_Y_I : INTEGER;
variable CLOSED_FINAL : INTEGER;
variable ADD1_IN, ADD1_C0NST: INTEGER;
variable FD_FLAG, BD_FLAG: BOOLEAN;
variable SUB1_INA, SUB1_INB,
SUB2_INA, SUB2_INB,
SUB3_INA, SUB3_INB: INTEGER;
variable TEMPJ3M0, TEMP_GM1 , TEMP_GM2 ,
TEMP_GM3 , TEMPJ3M4 , TEMP_GM5 ,
TEMP_GM6, TEMP_GM7, TEMP_GM8 : INTEGER;
variable CURR_C, CURR_M, CURR_Y: INTEGER;
variable NEXT_C, NEXT_M, NEXT_Y: INTEGER;
variable C_INVDEL, M_INVDEL, Y_INVDEL: INTEGER;
variable HI BOUND, LO BOUND: INTEGER;
begin
loop
wait until CLK 'event and CLK=
DONE <= ' 0 ' ;
if START = ' 0 ' then
1'
CLOSED (0)
CLOSED (1)
CLOSED (2)
CLOSED (3)
D_CALC_X0 <=
D_CALC_X1 <=
D_CALC_X2 <=
else
CLOSED (0)
CLOSED (1)
CLOSED (2)
CLOSED (3)
D_CALC_X0 <=
D_CALC_X1 <=
D CALC X2 <=
2000000000
2000000000
2000000000
2000000000
XI
X2
X3
2000000000
2000000000
2000000000
2000000000
XI
X2
X3
-- COMPUTE INITIAL ESTIMATE
-- FIRST WE MUST GET THE CLOSEST POINTS TO X
I:=0;
MEM MUX SEL <= ' 0 ' ;
while I < P_SIZE loop
ADDR_CALC_IN1 <= 6
ADDR_CALC_IN2 <= I
ADDR CALC IN3 <= 3
94
wait until CLK 'event and CLK= ' 1
ADDR_CALC_IN1 <= 6,
ADDR_CALC_IN2 <= I
ADDR_CALC_IN3 <= 4,
wait until CLK' event and CLK=
D_CALC_YO <= MEMIN;
ADDR_CALC_IN1 <=
ADDR_CALC_IN2 <= I;
ADDR_CALC_IN3 <= 5,
wait until CLK 'event and CLK=
D_CALC_Y1 <= MEMIN;
wait until CLK'event and CLK= ' 1 ' ;
D_CALC_Y2 <= MEMIN;
wait until CLK'event and CLK= ' 1 '
wait until CLK'event and CLK= ' 1 '
wait until CLK'event and CLK= ' 1 '
wait until CLK'event and CLK= ' 1 '
wait until CLK'event and CLK= ' 1 '
if D_CALC_IN < CLOSED (0) then
CLOSED (3) := CLOSED (2);
CLOSED (2) := CLOSED (1);
CLOSED (1) := CLOSED (0);
CLOSED (0) := D_CALC_IN;
CLOSEIDX(3) := CLOSEIDX(2) ;
CLOSEIDX(2) := CLOSEIDX(l)
CLOSEIDX(l) := CLOSEIDX(O);
CLOSEIDX(O) := I;
elsif D CALC IN < CLOSED (1) then
CLOSED (3) :=
CLOSED (2) :=
CLOSED (1) :=
CLOSEIDX(3)
CLOSEIDX (2)
CLOSEIDX(l)
elsif D CALC
CLOSED (2)
CLOSED (1)
D_CALC_IN
= CLOSEIDX(2) ;
= CLOSEIDX(l) ;
= I;
IN < CLOSED (2) then
CLOSED (3) := CLOSED (2);
CLOSED ( 2 ) : = D_CALC_IN ;
CLOSEIDX(3) := CLOSEIDX(2)
CLOSEIDX(2) := I;
elsif D_CALC_IN < CLOSED (3)
CLOSED (3) := D_CALC_IN;
CLOSEIDX(3) := I;
end i f ;
then
I: =1 +1;
end loop;
I:=0;
CLOSED_FINAL := 2000000000;
while I < 4 loop
ADDR_CALC_IN1 <= 6;
ADDR_CALC_IN2 <= CLOSEIDX(I);
95
ADDR_CALC_IN3 <= 0;
wait until CLK'event and CLK= ' 1 ' ;
ADDR_CALC_IN1 <= 6;
ADDR_CALC_IN2 <= CLOSEIDX(I);
ADDR_CALC_IN3 <= 1 ;
wait until CLK'event and CLK= ' l ' ;
TEMP_C := MEMIN;
ADDR_CALC_IN1 <= 6;
ADDR_CALC_IN2 <= CLOSEIDX(I);
ADDR_CALC_IN3 <= 2 ;
wait until CLK'event and CLK= ' 1 ' ;
TEMP_M := MEMIN;
wait until CLK'event and CLK= ' 1 ' ;
TEMP_Y := MEMIN;
K:=0;
while K < 3 loop
J:=-4;
while J < 5 loop
INT_ADD1B <= J*DEL;
if K=0 then
INT_ADD1A <= TEMP_C;
elsif K=l then
INT_ADD1A <= TEMP_M;
else
INT_ADD1A <= TEMP_Y;
end if;
wait until CLK'event and CLK= ' 1 ' ;
if K=0 then
if INT_ADD1_RESULT > Pl_HI_BOUND then
TEMP_C_I := Pl_HI_BOUND;
elsif INT_ADD1_RESULT < P1_L0_B0UND then
TEMP_C_I := P1_L0_B0UND;
else
TEMP_C_I := INT_ADD1_RESULT;
end if;
TEMP_M_I := TEMP_M;
TEMP_Y_I := TEMP_Y;
elsif K=l then
TEMP_C_I := TEMP_C;
if INT_ADD1_RESULT > P2_HI_B0UND then
TEMP_M_I := P2_HI_B0UND;
elsif INT_ADD1_RESULT < P2_L0_B0UND then
TEMP_M_I := P2_L0_B0UND;
else
TEMP_M_I := INT_ADD1_RESULT;
end if;
TEMP_Y_I : = TEMP_Y ;
else
TEMP_C_I : = TEMP_C ;
TEMP_M_I : = TEMP_M ;
96
if INT_ADD1_RESULT > P3_HI_B0UND then
TEMP_Y_I := P3_HI_B0UND;
elsif INT_ADD1_RESULT < P3_L0_B0UND then
TEMP_Y_I := P3_L0_B0UND;
else
TEMP_Y_I := INT_ADD1_RESULT;
end if;
end if;
TRI_XO <= TEMP_C_I;
TRI_X1 < = TEMP_M_I;
TRI_X2 <= TEMP_Y_I;
TRI_START <= '1' ;
MEM_MUX_SEL <= ' 1 ' ;
loop
wait until CLK'event and CLK= ' 1 ' ;
TRI_START <= '0' ;
if (TRI_D0NE = ' 1') then
MEM_MUX_SEL <= ' 0 ' ;
exit;
end if;
end loop;
D_CALC_X0 <= XI;
D_CALC_X1 <= X2,
D_CALC_X2 <= X3,
D_CALC_Y0 <= TRI_Z0;
D_CALC_Y1 <= TRI_Z1;
D_CALC_Y2 <= TRI_Z2;
wait until CLK'event and CLK=
' 1 '
wait until CLK'event and CLK=
' 1 '
wait until CLK'event and CLK=
' 1 '
until CLK'event and CLK=
' 1 'wait
wait until CLK'event and CLK=
if D CALC IN < CLOSED_FINAL then
CURR_C
CURR_M
CURR_Y
CLOSED_FINAL
end if;
= TEMP_C_I
= TEMP_M_I
= TEMP_Y_I,
D CALC IN;
J:=J+1;
end loop;
K:=K+1;
wait until CLK'event and
CLK= ' 1 ' ;
end loop;
I:=I+1;
end loop;
-- INITIAL ESTIMATE AVAILABLE
-- COMPUTE GRADIENT MATRIX
I:=0;
while (I<=2) loop -- Gradient Matrix
if (1=0) then
97
ADD1_IN := CURR_C;
HI_BOUND := P1_HI_B0UND;
L0_B0UND := P1_L0_B0UND;
elsif (1=1) then
ADD1_IN := CURR_M;
HI_BOUND := P2_HI_B0UND;
LO_BOUND := P2_L0_B0UND;
else
ADD1_IN := CURR_Y;
HI_B0UND := P3_HI_B0UND;
L0_B0UND := P3_L0_B0UND;
end if;
INT_ADD1A <= HI_BOUND;
INT_ADD1B <= -DEL;
wait until CLK'event and CLK= ' 1 ' ;
if (ADD1_IN <= INT_ADD1_RESULT) then
INT_ADD1B <= DEL;
FD_FLAG := TRUE;
else
INT_ADD1B <= 0;
FD_FLAG := FALSE;
end i f ;
INT_ADD1A <= ADD1_IN;
wait until CLK'event and CLK= ' 1 ' ;
if (1=0) then
TRI_X0 <= INT_ADD1_RESULT;
TRI_X1 <= CURR_M;
TRI_X2 <= CURR_Y;
elsif (1=1) then
TRI_X0 <= CURR_C;
TRI_X1 <= INT_ADD1_RESULT;
TRI_X2 <= CURR_Y;
else
TRI_X0 <= CURR_C;
TRI_X1 <= CURR_M;
TRI_X2 <= INT_ADD1_RESULT;
end if;
INT_ADD1A <= LO_BOUND;
INT_ADD1B <= DEL;
TRI_START <= ' 1 ' ;
MEM_MUX_SEL <= ' 1 ' ;
loop
wait until CLK'event and CLK=
' 1 ' ;
TRI_START <= ' 0 ' ;
if (TRI_DONE = ' 1
' ) then
MEM_MUX_SEL <=
' 0 ' ;
exit ;
end if;
end loop;
SUB1_INA := TRI_Z0;
98
SUB2_INA := TRI_Z1;
SUB3_INA := TRI_Z2 ;
if (ADD1_IN >= INT_ADD1_RESULT) then
INT_ADD1B <= -DEL;
BD_FLAG := TRUE;
else
INT_ADD1B <= 0;
BD_FLAG := FALSE;
end if;
INT_ADD1A <= ADD1_IN;
wait until CLK'event and CLK= ' 1 ' ;
if (1=0) then
TRI_X0 <= INT_ADD1_RESULT;
TRI_X1 <= CURR_M;
TRI_X2 <= CURR_Y;
elsif (1=1) then
TRI_X0 <= CURR_C;
TRI_X1 <= INT_ADD1_RESULT;
TRI_X2 <= CURR_Y;
else
TRI_X0 <= CURR_C;
TRI_X1 <= CURR_M;
TRI_X2 <= INT_ADD1_RESULT;
end if;
TRI_START <= ' 1 ' ;
MEM_MUX_SEL <= '1';
loop
wait until CLK'event and CLK= ' 1 '
TRI_START <= ' 0 ' ;
if (TRI_DONE = ' 1') then
MEM_MUX_SEL <= ' 0';
exit;
end i f ;
end loop;
SUB1_INB := TRI_Z0;
SUB2_INB := TRI_Z1;
SUB3_INB := TRI_Z2 ;
if 1=0 then
TEMP_GM0 := SUB1_INA - SUB1_INB;
TEMP_GM1 := SUB2_INA - SUB2_INB;
TEMP_GM2 := SUB3_INA - SUB3_INB;
if FD_FLAG = TRUE and BD_FLAG = TRUE then
C_TNVDEL := INVDELJTWD;
else
C_INVDEL := INVDEL_OWD;
end if;
elsif 1=1 then
TEMPJ3M3 := SUB1_INA - SUB1_INB;
TEMP~GM4 := SUB2_INA - SUB2_INB;
TEMP_GM5 := SUB3_INA - SUB3_INB;
if FD_FLAG = TRUE and BD_FLAG = TRUE then
99
else
M_INVDEL := INVDELJTWD;
else
M_INVDEL := INVDEL OWD;
end i f ;
TEMP_GM6 : = SUB1_INA - SUB1 INB,
TEMP_GM7 : = SUB2_INA - SUB2 INB,
TEMP_GM8 : = SUB3_INA - SUB3 INB,
if FD FLAG == TRUE and BD FLAG = TRUE then
Y_INVDEL := INVDELJTWD;
else
Y_INVDEL := INVDELJDWD;
end if;
end if;
I: =1+1;
end loop;
- GRADIENT MATRIX AVAILABLE
-- BEGIN ICI ITERATION
J:=0;
while J <= K_MAX loop
TRI_X0 <= CURR_C;
TRI_X1 <= CURR_M;
TRI_X2 <= CURR_Y;
TRI_START <= ' 1 ' ;
MEM MUX SEL <= ' 1 ' ;
loop
wait until CLK'event and CLK= ' 1 ' ;
TRI_START <= ' 0 ' ;
if (TRI_DONE = ' 1') then
MEM_MUX_SEL <= ' 0 ' ;
exit;
end i f ;
end loop;
D_CALC_X0
D_CALC_X1
D CALC X2
XI;
X2;
X3;
D_CALC_Y0 <= TRI_Z0
D_CALC_Y1 <= TRI_Z1,
D_CALC_Y2 <= TRI_Z2,
wait until CLK'event and CLK=
wait until CLK'event and CLK=
wait until CLK'event and CLK=
wait until CLK'event and CLK=
wait until CLK'event and CLK=
if D_CALC_IN <= EP then
INV_OUT0 <= CURR_C;
INV_0UT1 <= CURR_M;
INV_0UT2 <= CURR_Y;
exit;
else
I:=0;
while I < 3 loop
100
ICI DP XO <= XI
ICI_DP_X1 < = X2;
ICI_DP_X2 <= X3;
ICI_DP_YO < = TRI_ ZO;
ICI_DP_Y1 < = TRI_~Z1;
ICI_DP_Y2 < =
ICI_DP_MU <= MU;
if 1=0 then
ICI_DP_GM0 <= TEMPJ3M0,
ICI_DP_GM1 <= TEMP_GM1,
ICI_DP_GM2 <= TEMP_GM2,
ICI_DP_INVDEL <= C_INVDEL;
ICI_DP_CURRVAL <= CURR_C;
elsif 1=1 then
ICI_DP_GM0 <= TEMP_GM3,
ICI_DP_GM1 <= TEMP_GM4,
ICI_DP_GM2 <= TEMPJ3M5,
ICI_DP_INVDEL <= M_INVDEL;
ICI_DP_CURRVAL <= CURR_M;
else
ICI_DP_GM0 <= TEMPJ3M6;
ICI_DP_GM1 <= TEMP_GM7;
ICI_DP_GM2 <= TEMPJ3M8;
ICI_DP_INVDEL <= Y_INVDEL;
ICI_DP_CURRVAL <= CURR_Y;
end i f
wait until CLK'event and CLK= ' 1
wait until CLK'event and CLK= ' 1
wait until CLK'event and CLK= ' 1
wait until CLK'event and CLK= ' 1
if 1=0 then
if ICI_DP_NEXT_VAL > Pl_HI_BOUND then
CURR_C := Pl_HI_BOUND;
elsif ICI_DP_NEXT_VAL < Pl_LO_BOUND then
CURR_C := Pl_LO_BOUND;
else
CURR_C := ICI_DP_NEXT_VAL;
end if;
elsif 1=1 then
if ICI_DP_NEXT_VAL > P2_HI_BOUND then
CURR_M := P2_HI_BOUND;
elsif ICI_DP_NEXT_VAL < P2_LO_BOUND then
CURR_M := P2_LO_BOUND;
else
CURR_M := ICI_DP_NEXT_VAL;
end if;
else
if ICI_DP_NEXT_VAL > P3_HI_BOUND then
CURR_Y := P3_HI_BOUND;
elsif ICI_DP_NEXT_VAL < P3_LO_BOUND then
CURR_Y := P3_LO_BOUND;
else
CURR_Y := ICI_DP_NEXT_VAL;
end if;
101
end if;
I:=I+1;
end loop;
end i f ;
J:=J+1;
end loop;
INV_OUT0 <= CURR_C
INVJ0UT1 <= CURR_M
INV_0UT2 <= CURR_Y
DONE <= ' 1 ' ;
end if; -- start
end loop; -- process loop
end process;
-- Shared utility adder used by Ctrl block
INT_ADD1_RESULT <= INT_ADD1A + INT_ADD1B;
end BEHAVIORAL;
102
9.2.9 ici_dp.vhd
-- Description: Main ICI Loop Datapath
-- Filename: ici_dp.vhd
-- Revisions: 4/13/02 Creation.
library IEEE, DWARE;
--library IEEE;
use IEEE. STD_LOGIC_1164. all;
use IEEE. NUMERICJSTD. all;
use DWARE. DW_Foundat ion. all;
entity ICI_DP is
port (XO, XI, X2, YO, YI, Y2 , GMO , GM1 , GM2 , MU, INVDEL, CURRVAL: in
INTEGER;
NEXTVAL: out INTEGER) ;
end ICI_DP;
architecture BEHAVIORAL of ICIJDP is
signal TEMP_0 , TEMP_1 , TEMP_2 , TEMP_M1 : INTEGER;
begin
TEMP_0 <= (XO - YO) * GMO
TEMP_1 <= (XI - YI) * GM1
TEMP_2 <= (X2 - Y2) * GM2
TEMP_M1 <= MU * INVDEL;
NEXTVAL <= CURRVAL + TCJINTEGER (TCJSIGNED ( ( (TEMP_0 + TEMP_1 + TEMP_2 )
* TEMP_M1) , 32) / TCJSIGNED (1000000, 32));
end BEHAVIORAL;
103
9.2.10 dis_dp.vhd
-- Description: Distance Calculator Datapath
-- Filename: dis_dp.vhd
-- Revisions: 4/13/02 Creation.
library IEEE, DWARE;
--library IEEE;
use IEEE. STCJLOGIC_1164. all;
use IEEE. NUMERIC_STD. all;
use IEEE.MATH_REAL.all;
use DWARE . DW_Foundat ion .all;
entity DIS_DP is
port (XO, XI, X2 : in INTEGER;
YO, YI, Y2: in INTEGER;
DISOUT: out INTEGER);
end DIS_DP;
architecture BEHAVIORAL of DIS_DP is
signal INT_D1, INT_D2 , INT_D3 : INTEGER;
begin
INT_D1 <= XO - YO
INT_D2 <= XI - YI
INT_D3 <= X2 - Y2
DISOUT <= TCJINTEGER (SORT (TCJSIGNED ( ( (INT_D1*INT_D1) +
( INT_D2 * INT_D2 ) + ( INT_D3 * INT_D3 ) ) , 3 2 ) ) ) ;
-- For simulation purposes
DISOUT <= INTEGER (SQRT (REAL ( (INT_D1*INT_D1) + (INT_D2*INT_D2 ) +
(INT_D3*INT_D3) ) ) ) ;
end BEHAVIORAL;
104
9.2.11 addr calc.vhd
-- Description: Memory Address Calculator Datapath
-- Filename: addr_calc . vhd
library IEEE;
use IEEE. STD_LOGIC_1164. all;
entity ADDR_CALC is
port (INI, IN2, IN3 : in INTEGER;
ADDR: out INTEGER) ;
end ADDR_CALC;
architecture BEHAVIORAL of ADDR_CALC is
begin
ADDR <= (INI * IN2) + IN3 ;
end BEHAVIORAL;
105
9.2.12 tablemem.vhd
-- Description: Memory module to hold the look-up table.
-- Filename: tablemem.vhd
-- Revisions: 3/13/02 Creation.
-- NOTE: This memory module is not intended for synthesis.
library IEEE;
use IEEE.STD_LOGIC_1164.all;
use IEEE. STD_LOGIC_ARITH. all;
use STD.TEXTIO.ALL;
entity TABLEMEM is
generic (MEMSIZE: NATURAL (6*2197)
port (CLK: in STCJLOGIC;
WR: in STCJLOGIC;
ADDR: in NATURAL;
DIN: in INTEGER;
DOUT: out INTEGER);
Read: WR=0, Write: WR=1
end TABLEMEM;
architecture BEHAVIORAL of TABLEMEM is
type MEM_ARRAY is array (NATURAL range 0 to MEMSIZE) of INTEGER,
begin
LOAD: process
file LUT1: text is in "cmy.dat";
file LUT2 : text is in "lab.dat";
variable L: LINE;
variable IDX: INTEGER;
variable MEM: MEM_ARRAY;
begin
IDX := 0;
while not endf ile (LUT1) loop
-- Read from CMY
readline(LUTl, L)
read(L, MEM (IDX) ) ;
IDX := IDX + 1;
read(L, MEM (IDX) ) ;
IDX := IDX + 1;
read(L, MEM (IDX) ) ;
IDX := IDX + 1;
-- Read from LAB
readline(LUT2, L) ;
read(L, MEM (IDX) ) ;
IDX := IDX + 1;
read(L, MEM (IDX) ) ;
IDX := IDX + 1;
read(L, MEM (IDX) ) ;
106
IDX := IDX + 1;
end loop,-
loop
if (WR = '1' ) then
MEM (ADDR) := DIN;
else
if (ADDR >= 0) then
DOUT <= MEM (ADDR) ;
end if;
end if;
wait until CLK'event and CLK= 1 ' ;
end loop;
end process LOAD;
end BEHAVIORAL;
107
