A multi-ASIC real-time implementation of the two dimensional affine transform with a bilinear interpolation scheme by Bentum, Mark J. et al.
Journal of VLSI Signal Processing, I0, 261-273 (1995) 
9 1995 Khwer Academic Publishers, Boston. Manufactured in The Netherlands. 
A Multi-ASIC Real-Time Implementation of the Two Dimensional Affine 
Transform with a Bilinear Interpolation Scheme 
MARK J. BENTUM, MARTIN M. SAMSOM AND CORNELIS H. SLUMP 
University of Twente, Department of Electrical Engineering, Laboratory for Network Theory and VLSI Design, 
P.O. Box 217, 7500 AE Enschede, The Netherlandv 
Received May 25, 1993; Revised October 19, 1994 
Abstract. Some image processing applications (e.g. computer graphics and robot vision) require the rotation, scal- 
ing and translation of digitized images in real-time (25-30 images per second). Today's tandard image processors 
can not meet this timing constraint so other solutions have to be considered. This paper describes a multi-ASIC 
solution which is capable of doing the image processing tasks in real-time. The first ASIC is a so-called affine 
transformer which calculates a one-dimensional coordinate very 25 ns. The second ASIC is a bilinear interpolator 
which calculates an interpolated value from four known surrounding values, again every 25 ns. This ASIC is 
designed in a modular setup which results in a flexible accuracy of the interpolation. If more accurate interpolation 
is required, another ASIC (containing an interpolation stage) is used. In this way for each application a proper 
accuracy is implemented, reaching optimal silicon area utilization and desired accuracy of interpolation. Using two 
affine transformers (for obtaining a two dimensional coordinate pair) and an interpolator, one can build a system 
which can translate, rotate and scale an image of size 1024 9 1024 in real-time (25-30 images per second). In this 
paper the system as well as the design of the ASICs are presented. 
1 Introduction 2 Mathematical Background 
Digital image processing has a fast growing flow of 
applications [1], [2], mainly due to the strongly in- 
creasing performance of computers. However, real- 
time applications are still difficult to implement, so we 
have to look at dedicated solutions. Here we address the 
problem of the real-time manipulation of images. This 
means we want to rotate, translate and scale an image of 
size 1024 x 1024 at a rate of 25-30 images per second. 
Figure 1 shows an example of image manipulation. 
This technique of manipulating an image in real-time 
can be used in many applications, like medical appli- 
cations (e.g. real-time X-ray), industrial applications 
(e.g. robot vision) and also in consumer electronics 
(e.g. digital video camera). 
This paper consists of six sections. Section 2, 
which follows this introduction, presents the mathe- 
matical background of the system. The efficient im- 
plementation of the algorithm in VLSI is discussed in 
section 3. In section 4 we discuss our design method- 
ology. We consider this section to be very important 
because it will present a methodology which we think 
will result in a first-time-correct design. In section 5 
the results will be shown. We end this paper with a 
discussion. 
In this section the mathematical background of the 
problem is addressed. The system to be designed must 
be capable of rotating, translating and scaling an im- 
age. This process can be described as a coordinate 
transformation. If (i, j )  are the coordinates of a pixel 
in the input image and (x, y) the corresponding coor- 
dinates in the output image, we can write the following 
matrix-equation describing the desired operations: 
(y)=(A 
(1) 
A through F are constants containing the information 
about he amount of rotation, translation and scaling. If  
the origin of the image is in the upper left corner, we 
can write the constants A through F as follows: 
A = S x COS 
B = -Sx sin ot 
C = S r sin ot 
D = Sy cos 
E = Ox(1 - Sx cos ~ - Sx sin or) + X 
F = Oy(1 - Sy cos ot + Sy sin or) + Y (2) 
262 Bentum, Samsom and Slump 
Fig. 1. 
Fig. 2. 
Example of the manipulation of an image. 




An example of the parameters for image manipulation. 
IY 
Here ot is the counterclockwise rotation angle of 
the image (with the center of the image (Ox, Oy) as 
the rotation-axis), Sx, S r are the scale factors and X, Y 
are the pixel translation distances in x and y direction. 
In Fig. 2 an example for the parameters (Ox, Oy), X, 
Y,  and ot is given. 
Some applications require the constants to be recal- 
culated for every flame. Because the maximum frame 
rate is 30 frames per second, these 6 constants have to 
be calculated every 33 ms. The calculation of these 
constants can easily be done by a general purpose pro- 
cessor (PC or low performance workstation). The ac- 
tual image addresses however can not be computed in 
real-time with such a general purpose processor (see 
section 3). 
In Eq. I the input image is scanned, yielding pixel 
information for the output image. This method is often 
referred to as source scanning. Source scanning is sel- 
dom used, since it results in holes in the output image 
because of overlapping pixels. Another disadvantage is 
the necessary implementation f four multiplications 
and two additions for every pixel in the input image, 
which would require a large VLSI area (when a VLSI 
implementation is used). 
A more efficient method is screen scanning. In 
screen scanning the output image is scanned. In Eq. 3 
the inverse transformation is given. 
C' D ' Jky , I+(F  ' )  (3) 
Again (x, y) are the coordinates in the output image 
and (i, j )  are the coordinates in the input image, but 
now the input coordinates are calculated correspond- 
ing to a certain output coordinate pair. Of course the 
constants A' through F' are different constants han the 





n'  = 
Sy 
- sin ot 




cos ot'~ cos ot sin ot 
E '=Ox 1 -fix ] - -X  ~x Y ~y 
sin ot 
- Oy Sy 
( F '=Oy 1 ~, ]+X 
sin 
+ Ox~ & 
sin cz cos 
_ _ V ~  
& s~ 
(4) 
Again these constants can easily be computed by the 
computer. Here the difference is that coordinates in the 
input image are calculated instead of in the output im- 
age. The way to create a complete output image is to 
calculate a corresponding coordinate pair in the input 
image for each coordinate pair in the output image9 A 
raster-scan sequence isused, which gives some advan- 
tages for the design of the transformer. Let's start with 
(x, y) = (0, 0). The following points are (0, 1), (0, 2), 
and so on, which results in the following scheme for 
the i coordinate: 
io.o = E' + O x B' + O x A' 
io.l = E' + O • B' + 1 x A' = io,o + A' 
io.2 = E '+0 x B '+2 x A' = i0.1 +A'  
9  9  
il.o = E' + l x B' + O x A' = io.o + B' 
i l l  = E' + I x B' + l x A' = il.o + A' 
i 2 .0=E '+2• B '+0xA '= i l .0+B'  
. o .  (5) 
As can be seen in the scheme above, the coordinates 
(i, j )  can be calculated incrementally9 
A Multi-ASIC Real-Time Implementation 263 
Fig. 3. Situation i output image where a pixel value has to be 
found in Pint. 
Unfortunately the calculated coordinates in the in- 
put image will not likely be positioned on gridpoints, 
since the matrix multiplication i Eq. 3 yields coordi- 
nates with a fraction. The final image is an image with 
pixel values on a regular image matrix and therefore we 
need some kind of interpolation tofind the actual pixel 
values. Figure 3 illustrates this situation9 The points 
P1 through P4 are known pixel values and point Pint 
is the point where we want to know the pixel value. 
If a Cartesian coordinate system is used with equal 
sized directions, the two-dimensional interpolation can 
be accomplished by a one-dimensional interpolation 
with respect o each coordinate axis [3]. Therefore 
we will only discuss the one-dimensional interpolation 
functions. 
The ideal interpolation i one dimension applies an 
ideal low pass filter, which is a rectangular function 
in the frequency domain. In the spatial domain this 
corresponds to
oo sin (~[x int -  kxs]) 
f (x int)= ~ f(kx,.) (6) ( txin,- kxsl) k=- -oo  
with fc = Tc 1 being the cutofffrequency and x.,. being 
the sampling distance9 
Resampling results in a repeated frequency spec- 
trum. If the interpolation function is not an ideal fil- 
ter, two effects might appear which can cause artifacts 
264 Bentum, Samsom and Slump 
in the final image. First an attenuation fthe higher fre- 
quencies in the original spectrum will occur. Secondly 
the repeated spectra will alias back into the original 
spectrum. Aliasing will only occur if the spectrum of 
the interpolation filter is not zero at frequencies above 
the cutoff requency. 
From Eq. 6 it follows that an infinite number ofpixels 
should be used to calculate the new pixel value, which 
is of course impossible to realize. Therefore other in- 
terpolation techniques will have to be considered. 
From a computational point of view, the easiest in- 
terpolation algorithm is the nearest neighbor interpo- 
lation, where each new pixel is given the value of the 
original pixel which is nearest to it. In the spatial do- 
main this is equivalent to a convolution with a rectangle 
function. The frequency response is similar to a sinc- 
function. So higher frequencies are blurred and alias- 
ing also occurs. In certain situations, uch as diagonal 
straight lines, this causes ome serious artifacts, like 
intensity steps instead of a continuous line. In medical 
applications this technique is not accurate nough. 
Because of the disadvantages of nearest neighbor in- 
terpolation, a bilinear interpolation technique is often 
used. This technique calculates a new pixel value by a 
linear interpolation of the four surrounding pixel val- 
ues. In the case shown in Fig. 3, Pint will be equal to: 
Pint = (1 -- a)(l  - b)P1 + (1 - a)bP2 
+a(1-b)P3+abP4 (0<a,b<l )  (7) 
Linear interpolation amounts to convolution of 
the sampled ata with a triangle function. This triangle 
function corresponds toa modest low-pass filter in the 
frequency domain. Therefore it attenuates frequencies 
near the cut-off frequency resulting in smoothing of 
the image and it passes a significant amount of energy 
above the cut-off requency. 
In principle it is possible to use a higher order in- 
terpolation technique, where new points are calculated 
using at least four points in each direction. More in- 
formation can be found in [3], [4] and [5]. Although 
higher interpolation techniques yields a higher q.uality 
of the images, the computational effort is significantly 
more complex. For this reason we have chosen to im- 
plement the bilinear interpolation technique. 
3 Technical Approach 
The problem of translating, rotating and scaling an im- 
age in real-time can be solved by dividing the system 
in two parts, a transformer and an interpolator. The im- 
age has a size of 1024 by 1024 pixels and the frame 
rate is at maximum 30 images per second (resulting 
in about 26 x 106 pixel operations per second). Fur- 
thermore, a total of 19 multiplications and additions 
(4 multiplications and 4 additions for the transformer 
and 8 multiplications and 3 additions for the bilinear 
interpolations) per pixel are required, which results in 
a computing requirement of 494 MOPS (Mega Op- 
erations Per Second). The use of lookup tables could 
possibly substantially ower this number, but it is still 
clear that computers do not have enough computing 
power to perform this task. Developing one or more 
ASICs is one solution. The idea is to make a printed 
circuit board (PCB) containing the ASICs and to put 
this board into a computer. A user interface is devel- 
oped so the user can select he six constants A' through 
F' using mouse and keyboard. These constants are 
fed to the board every 25 ms (for each new frame). 
This task can easily be performed by a general purpose 
computer, not requiring the development of dedicated 
hardware. 
Because the development of the transformer and in- 
terpolator are two different designs we will discuss 
them separately in the next two subsections. The third 
subsection will address the total design. 
3.1 Design of the Transformer 
If we look at Eq. 3 we see that there are two simi- 
lar equations for i and j.  Therefore only one of the 
equations has to be implemented: 
i = A'x + B'y + E' (8) 
The screen scanning method, discussed in section 2, 
will be implemented on a VLSI. In Fig. 4 a block 
diagram of this idea is given. The necessary mul- 
tiplications are realized using one adder, saving sil- 
icon area. The operation is as follows (see Fig. 4): 
the i-reg register stores the computed address and the 
Y x B'-reg register saves the start value for each line. 
First the E' value is stored in i-reg and Y x B'-reg, 
which is the start value for the first line. Now the 
addresses of the original image for the first line are 
computed by the repetitive addition of A'. After one 
line has been computed, B' is added to the previous 
start value for the computation of a line (Y x B') 
and saved in the Y x B'-reg register. Then the ad- 
dresses in the original image for the next line will 
be computed by the incremental ddition of A'. At 
A Multi-ASIC Real-Time Implementation 265 
21 
Data 
1 1 1 
Mux 














Fig. 4. Implementation of the transformer using an incremental computing scheme. 
the end of the complete image E' is stored in the 
Y x B'-reg and i-reg register completing the rasterscan 
sequence. 
This addressing scheme is implemented on a VLSI 
(actually two identical chips, one for the i and one 
for the j coordinate, resulting in a smaller chip with 
therefore a higher yield and lower cost). The A' and B ~ 
constants are products of zoomfactors and the cosine 
or sine of the angle (see Eq. 5). Therefore the zoomfac- 
tor determines the maximum value of these constants. 
Defining a minimum zoomfactor Sx = Sy = 88 yields 
a maximum value of 4 for the two constants A' and 
B', which can be represented by a 3 bit integer part 
(with MSB being the sign bit) and an 8 bit fractional 
part (11 bits in total). The constant E' represents he 
translation of the output image with respect to the input 
image. Defining a maximal translation of a quarter of 
an image yields a 9 bit integer part (with one of them a 
sign bit) and a 8 bit fractional part; for a total of 17 bits 
for the E' constant. Intermediate calculations are done 
with 21 bits. The output address contains 10 bits for 
the integer part (to address between 0 and 1023) and 2 
fractional bits. 
3.2 Design of the Interpolator 
As discussed before, it is very unlikely that the cal- 
culated coordinates in the input image fall onto grid- 
points. However, the pixel values are only known on 
gridpoints. Therefore an interpolation technique is 
needed to obtain the corresponding pixel value. As 
discussed in section 2 we use the bilinear interpolation 
technique. 
A direct implementation f the bilinear interpolation 
technique (Eq. 7) requires 8multipliers and 3 adders (if 
we do not count the (1 - a) and (1 - b) calculations). 
Directly implementing these calculations in hardware 
will be difficult because of the speed constraint, similar 
to the problems with the transformer. Therefore other 
solutions were considered. 
266 Bentum, Samsom and Slump 
Fig. 5. Possible coordinates in case of two fraction bits. 
Fig. 6. Bilinear interpolation in stages. 
The accuracy of the transformer determines the 
length of the fractional part of the coordinates (fixed 
point numbers are used for the coordinates instead of 
floating point numbers). Therefore only a limited num- 
ber of coordinate values are possible. When two frac- 
tion bits are used, only the points shown in Fig. 5 are 
addressed, leading to a solution where multipliers can 
be avoided. 
As stated in the introduction, the system described 
in this paper can be used in various applications and 
the accuracy demands of these applications differ. A 
medical application for instance requires much more 
accuracy than a consumer application like the digital 
video camera. A typical address accuracy of two bits is 
enough for such consumer applications, however, for 
medical application (e.g. dual energy image processing 
[4]) more than 5 fraction bits are sometimes required. 
Therefore flexibility in accuracy is required. We have 
designed a modular set up, so a flexible accuracy iseas- 
ily achieved. A simple block for every fraction bit is 
designed. So, if four fraction bits are used, four build- 
ing blocks will do the task. In Fig. 6 this idea is shown. 
In Fig. 7 a block diagram of a one-fraction-bit interpo- 
lation is given. 
The bilinear interpolation is built up from different 
stages of these one-fraction-bit interpolations. In the 
first stage the intermediate points of the four known 
points are calculated, which is done by adding the val- 
ues and oing one right shift operation (see block di- 
agram in Fig. 7). Now the possible coordinates in the 
case of 1 fraction bit are calculated. The next thing to 
do is to select a quadrant where the requested pixel 
value is situated. This selection is accomplished by the 
values of the fraction bit in both directions. The quad- 
rant is bounded by four new points which are the output 
values of this stage. Note that one of the four points is 
an original pixeI value. If another fraction bit is avail- 
able, another stage is added, using the output values 
of the previous tage as input values. The four output 
values of the last stage will be the input values of the 






Fig. 7. Block diagram of an one-fraction-bit interpolation. 
selection unit, which will select he final value. By im- 
plementing this modular setup which uses computed 
intermediate values, only adders and no multipliers are 
needed, saving silicon area. 
In Fig. 8 a block diagram of the bilinear in- 
terpolation is given. If more accurate interpola- 
tion is required, another ASIC (containing an 
interpolation stage) is used until the required accuracy 
is achieved. 
3.3 Design of the Total System 
Now that the transformer and the interpolator have 
been discussed, the total system will be addressed. 
Two transformers are used, which calculate coordi- 
nates in the input image. If the calculated coordinates 
are not in the address range of the input image, the in- 
terpolation is skipped and the output pixel becomes 
zero (black when using a standard colormap). Us- 
ing the four surrounding pixel values, the final value 
in the output image is calculated using the modu- 
lar bilinear interpolator. Every 25 ns a pixel value 
is calculated in the output image. Using conven- 
tional memory architecture and one access per pixel 
requires a memory pixel rate of 25/4 = 6 ns. Be- 
cause memory with a pixel rate of 6 ns is extremely 
expensive (if it is even available), therefore four mem- 
ory banks are used instead of one. The pixels are 
stored in such a fashion that only one value from 
each RAM is needed for every pixel in the output im- 
age. This can be achieved by making the following 
division: 
RAMI: Pixels from even lines and even columns 
RAM2: Pixels from even lines and odd columns 
RAM3: Pixels from odd lines and even columns 
RAM4: Pixels from odd lines and odd columns 
pY one / tone / 
,raction ! "1  ,raotion I---* 
P _1 bit II _1 bit I I  
2 -~ Inter- ~ Inter- r 
P / pola- II I pola- I I  
3----~ tion ~ tion 
P4------~ stage ~ stage ~ "~ 








fraction bit 0 
Fig. 8. Block diagram bilinear interpolation. 
268 Bentum, Samsom and Slump 
RAM1 RAM2 
i~ntl 








) - - j=even 
) - -  j=odd 
~--j=even 
i=even i=odd i=even 
Fig. 9. Illustration of the need for extra selection logic. 
Note that the total amount of memory is the same as 
in the case where one single memory is used, but that 
the memory bandwidth is four times higher. The total 
amount of RAM is 1024 x 1024 pixels x 8 bits x 2 
(We use a so-called ping-pong memory: One memory 
is used to write new data in, while another memory is 
used for display) -- 2 MByte. 
The disadvantage of this memory distribution 
scheme is that some RAM selection logic must be 
added. Figure 9 illustrates the need for the extra RAM 
selection logic. In this figure a part of the source im- 
age can be seen. The distribution of the pixel values 
along the RAMs is also depicted. For eintl the up- 
per left corner is a value out of RAM1. The values 
out of the other RAMs can be selected by the same 
coordinate pair. For Pi,t2, however, this situation is 
different. Rounding the coordinate of Pint2, yields the 
correct address for RAM2 and RAM4. For the cor- 
rect address of RAM1 and RAM3, a coordinate pair 
which is one position higher than the coordinate pair 
for RAM2 and RAM4 is needed. The same discussion 
holds for situations in the other direction. Therefore 
some logic is needed to control the correct addresses 
for the RAMs. Since the situation depends on the value 
of the computed address (odd or even) the control ogic 
is very easy to design. 
In Fig. 10 the setup of the total system is shown. 
A Personal Computer with a user interface generates 
a few control signals for the system as well as the 
6 constants A' through F'. Two transformers (ASIC 
number 1) calculate the coordinates in the source im- 
age. If the calculated address is out of the address range 
of the input image, a white pixel is sent to the output 
image. With some selection logic the four surrounding 
known pixel values from the source image are fetched 
and used in the second ASIC, the interpolator. 
4 Design Methodology 
When developing the algorithm and the organization of 
the system all possible configurations should be taken 
into account in order to take full advantage of the use 
of ASICs. A high-level description of the ASICs and 
the total system clarifies the interfaces and the func- 
tionality of the different pans and has often proved to 
be necessary to guarantee a first-time-correct design. 
In our case, the high-level description was made using 
MoDL [6] which offers similar (and more) facilities to 
VHDL. With this language the total image processing 
system was described: an image was taken as input, the 
memory organization was tested, the developed algo- 
rithm (divided over several ASICs) was executed and a 
bitmap as output was obtained. Using this bitmap out- 
put, errors could be visualized much easier compared 
to general test vectors. 
After the high-level description was finished and 
proven to be correct (with the help of the algebraic 
simulation capability of MoDL) the design team split 
in different groups [7]. One group designed the inter- 
polator ASIC while another group developed the trans- 
former ASIC. A third group designed the PCB. Using 
the pertinent high-level description as a reference, ach 
group could develop their design towards gate level. A 
major aspect during this detailing/refinement process 
was the efficient mapping of the algorithm onto the 
hardware resources. If an alteration of the algorithm 
would give a better use of the hardware resources a
modification and test of the system was performed. 
The high-level description served as a reference to 
which the detailed designs could be compared. See 
also Fig. 11. 
The designs were realized in a 1.5 /xm double 
metal single poly CMOS standard cell technology. 
Schematic entry of the designs was done with Mentor 
Graphics' IDEA Station [8]. The designs were sim- 
ulated with Quicksirn [9] and results were compared 
with the high-level description to ensure afunctionally 
correct design. 
In the design process, the algorithms were mapped 
on the available standard cells. Here the designer tried 
to make this mapping as efficient as possible taking 
into account he delay of the different standard cell 












Block diagram of the total system. 




components, which led to a recursive process of design 
changes until the timing requirements were met. When 
the schematic design was correct the layout generation 
was done using Mentor Graphics' CELL Station [10]. 
After the generation of the layout he two ASICs were 
processed atthe foundry. The last step was the design of 
the printed circuit board for the total system. This PCB 
contains three of the designed ASICs, a transformer 
ASIC for both the i- and j-address and an interpolator 
ASIC performing a two step interpolation. Addition- 
ally some control ogic with the frame and line syn- 
chronization data and the system clock were required. 
5 Results 
The total design resulted in a system which performs 
the translation, rotation and scaling operation at a speed 
of 25-30 frames per second. Each computed image is 
interpolated in two steps using a bilinear interpolation 
technique. By dividing the original image over four 
memories, the memory bandwidth is limited so ex- 
pensive high speed memory is not needed. With the 
computed address four surrounding pixel values are 
selected from each of the four RAMs. The resulting 
pixel value after interpolation is stored in a (video) 
memory. We specified and simulated the total system 
using a high-level design language (in our case MoDL 
[6]). From this description we derived a hardware de- 
scription of the separate ASICs from which the ASICs 
could be realized. At this moment the ASICs have been 
processed and the PCB design is ready. The ASICs as 
well as the memory and the controller (implemented 
in FPGA) fit on a PC-AT board. In Table 1 the most 
important parameters of the ASICs are presented. The 
270 Bentum, Samsom and Slump 
Start with the specification 
of the system. 
After several ideas  concept 
idea is born. 
Description of the complete system 
and simulation with a test image. 
In this level also the dividing in sub- 
tasks is done. 
Sequential description of the ASIC (and 
later concurrent) and simulation 
with a test image. 
Realization of the ASIC with a design tool 
(i.e. Mentor Graphics Cell Station). 
Submission of the tape of the layout o 
the foundry. 
= a dec is ion  
Fig. 11. Design flow. 
clock speed in this table is the clock speed needed for 
manipulating a 1024 x 1024 images with a 40 ms la- 
tency (25 images per second). In Fig. 12 the floorplans 
of the two ASICs is shown and in Fig. 13 a photo- 
graph of the PCB is shown. The socket of the inter- 
polation chip did not fit on the PCB, so an alternative 
home-made socket has been used, as can be seen on 
the photograph. The system is completely functional 
and meets the specifications. 
6 Discussion 
Table 1. Parameters ofthe designed ASICs. 
Transformer Interpolator 
Area 2.93 x 3.15 mm 2 3.52 x 3.55 mm 2 
Clock speed 40 MHz 40 MHz 
Pin count 34 pins 51 pins 
Starting the design process from a system point of view, 
we were able to organize the system in such a way that 
expensive hardware was avoided. The use of adders 
instead of multipliers proved to be sufficient here. A 
thorough survey of the system partitioning was made 
beforehand. 
A Multi-ASIC Real-Time Implementation 271 
Fig. 12. Hoot'plans ofthe ASICs. 
Fig. 13. The printed circuit board. 
272 Bentum, Samsom and Slump 
According to the simulations the bilinear interpola- 
tion resulted in a considerable improvement in the im- 
age quality, in comparison with the nearest neighbor 
interpolation technique. 
The high-level description used was found to be of 
great help in the development of the algorithm and the 
specification of the different interfaces. In this way an 
efficient solution to the problem and a first-time-correct 
design could be achieved. 
The described architecture is not only interesting for 
image processing applications, but also for (real-time) 
computer graphics ystems. Two algorithms, heavily 
used in this field, color interpolation and Gouraud 
Shading, can make use of the developed ASICs. Of 
course special purpose hardware isdesigned especially 
for the computer graphics ystems, like in the Pixel- 
Planes System [I 1]. The main difference between the 
interpolator we discussed in this paper and other im- 
plementations is that our solution does not make use of 
multipliers. This results in a small chip area. Another 
application field is the visualization of three dimen- 
sional data. In this visualization step we want to map a 
three dimensional scene onto a two dimensional screen 
[12]. One step in this algorithm is the calculation of 
sample values by means of interpolation. The devel- 
oped interpolation ASIC can be used for this purpose. 
Further investigations have been done in the field 
of using the chips for visualizing three dimensional 
medical data in real-time [12]. Furthermore we have 
looked at the possibility to extending the idea of scal- 
able interpolation toa higher level interpolation: cubic 
spline interpolation. Since cubic spline interpolation 
coefficients are defined by a third order polynomial, it
is not trivial to implement the interpolation technique 
in a scalable fashion. However, we have implemented 
the cubic spline interpolation method on a VLSI [13]. 
With this ASIC it is possible to do real-time cubic 
spline interpolation on a 512 x 512 image. A pro- 
totype system, using this ASIC, is currently under 
development. 
Acknowledgments 
We like to thank all members of our group who have 
been helpful during the design process of the image pro- 
cessing system. Especially we want to thank Geert-Jan 
Laanstra for his technical support and Hans Snijders for 
his software and design support. Furthermore we want 
to thank the students Mark Boerrigter, Marco Bosma, 
Durk van Veen, Marc Weusting and Nam Tran for mak- 
ing the layouts of the ASICs and the PCB. 
We also like to thank the anonymous referees for 
their valuable comments which, we have convinced, 
have improved the clarity of the presentation a d have 
pointed out interesting related work. 
Finally we would like to thank the Eurochip VLSI 
Design Training Action for supplying CAD-software 
and processing facilities. 
The investigations were partly supported by the 
foundation for Computer Science in the Netherlands 
(SION) with financial support from the Netherlands 
Organization for Scientific Research (NWO). 
References 
I. M.E Ekstrom, Digital hnage Processing Techniques, Orlando, 
FL: Academic Press, Chapter 8, p. 298, 1984. 
2. Proceedings of the 11th International Cm~'erence on Pattern 
Recognition, The Hague, The Netherlands, September 1992, 
sessions A3, A9 and AI4. 
3. J.A. Parker, R.V. Kenyon, and D.E. Troxel, "Comparison of In- 
terpolating Methods for Image Resampling," IEEE Transactions 
on Medical hnaging, Vol. 1, pp. 31-39, 1983. 
4. M.J. Bentum, R.G.J. Arendsen et al., "Design and Realization 
of High Speed Single Exposure Dual Energy Image Process- 
ing," Proceedings of the Fifth IEEE Symposium of Computer 
Based MedicalSystems, Durham, North Carolina, June 1992, pp. 
25-34. 
5. M.J. Bentum, M.A. Boer, A.G.J. Nijmeijer, M.M. Samsom, and 
C.H. Slump, "Resampling of Images in Real-Time," Proceed- 
ings of the IEEE ProRISC workshop on Circuit, Systems and 
Signal Processing, Papendal, The Netherlands, March 1994, 
pp. 21-26. 
6. J. Smit et al., "The MoDL Hardware Design System," Proceed- 
ings 8th Int. Conf. Computer Hardware Description Languages 
and Their Applications, pp. 327-342, April 1987. 
7. M.M. Samsom, "VLSI System Design Training at the Univer- 
sity of Twente by Means of Student Design Teams," Proceed- 
in M of the Third Eurochip Work.vhop on VLSI Design Training, 
Grenoble, France, October 1992, pp. 66-71. 
8. Mentor Graphics, IDEA Series Schematic Capture User's Man- 
ual, 1989/1990. 
9. Mentor Graphics, IDEA Series Quick.rim User~ Manual, 
1989/1990. 
10. Mentor Graphics, IDEA Series Cell Station User's Manual, 
1989/1990. 
11. H. Fuchs, J. Poulton, J. Eyles, T. Greer, J. Goldfeather, 
D. Ellsworth, S. Molnar, G. Turk, B. Tebbs, and L. Israel, "Pixel- 
Planes 5: A Heterogeneous Mulfiprocessor Graphics System 
Using Processor-Enhanced Memories," Computer Graphics, 
Vol. 23, No. 3, pp. 79-88, July 1989. 
A Mul t i -AS IC  Rea l -T ime Implementat ion  273 
12. M.J. Bentum and J. Smit, "Design of a Parallel VLSI Engine 
for Real-Time Visualization of 3D Medical Images," Proceed- 
ings r SPIE Medical Imaging 1994, Newport Beach, California, 
February 1994, Vol. 2164, pp. 370-381. 
13. A.G.J. Nijmeijer, M.A. Boer, C.H. Slump, M.M. Samsom, M.J. 
Bentum, G.J. Laanstra, J. Smit, and O.E. Herrmann, "Correc- 
tion of Lens-Distortion for Real-Time Image Processing Sys- 
tems," Proceedings of the 1993 IEEE Workshop on VLSI Sig- 
nal Processhlg, Veldhoven, The Netherlands, October 1993, 
pp. 316-324. 
Martin M. Samsom was born in Amsterdam, The Netherlands in
1966. In 1989 he received his M.Sc. degree in Electrical Engi- 
neering from the University of Twente, Enschede, The Netherlands. 
Currently he is working as a lecturer in VLSI System Design. His 
main interests are VLSI System Design, High Level Description 
Methods and Digital Signal Processing. 
Mark J. Bentum was born in Smilde, The Netherlands in 1967. In 
1988 he received the B.S. degree in Electrical Engineering from the 
Polytechnical High School of Groningen, The Netherlands. In 1991 
he received his M.Sc. degree (with distinction) in Electrical Engi- 
neering from the University of Twente, Enschede, The Netherlands. 
Currently he is working towards his Ph.D. degree at the University of 
Twente. His main research interests are parallel processing systems 
for computer graphics and medical electronics, including realization 
of algorithms in VLSI. He can be contacted at: 
mark@nt.el.utwente.nl 
Cornelis H. Slump received the M.Sc. degree in Electrical Engi- 
neering from Delft University of Technology, Delft, The Netherlands 
in 1979. In 1984 he obtained his Ph.D. in physics from the Univer- 
sity of Groningen, The Netherlands. From 1983 to 1989 he was 
employed at Philips Medical Systems in Best as head of a predevel- 
opment group on medical image processing. In 1989 he joined the 
Network Theory group from the University of Twente, Enschede, 
The Netherlands. His main research interest is in digital signal pro- 
cessing, including realization of algorithms in VLSI. 
