Fast fixed-point bicubic interpolation algorithm on FPGA by Koljonen, Janne et al.
  
 
This is a self-archived – parallel published version of this article in the 
publication archive of the University of Vaasa. It might differ from the original. 
Fast fixed-point bicubic interpolation algorithm 
on FPGA 
Author(s): Koljonen, Janne; Bochko, Vladimir A.; Lauronen Sami J.; 
Alander, Jarmo T. 
Title: Fast fixed-point bicubic interpolation algorithm on FPGA 
Year: 2019 
Version: Accepted manuscript  
Copyright ©2019 IEEE. Personal use of this material is permitted. 
Permission from IEEE must be obtained for all other uses, in any 
current or future media, including reprinting/republishing this 
material for advertising or promotional purposes, creating new 
collective works, for resale or redistribution to servers or lists, or 
reuse of any copyrighted component of this work in other works. 
Please cite the original version: 
 Koljonen, J., Bochko, V.A., Lauronen S.J., & Alander, J.T., 
(2019). Fast fixed-point bicubic interpolation algorithm on 
FPGA. In: IEEE Nordic Circuits and Systems Conference 
(NORCAS): NORCHIP and International Symposium of 
System-on-Chip (SoC), Helsinki, Finland (pp. 1–7). Institute of 
Electrical and Electronics Engineers (IEEE). 
https://doi.org/10.1109/NORCHIP.2019.8906933 
 
Fast Fixed-point Bicubic Interpolation Algorithm onFPGA
Itt Janne KoljonenSchool of Technology and InnovatíonsUniversity of Vaasa
Vaasa, Finland
https ://orcid.org/0000-000 I - 583 4 - 443'7
2nd Vladimir A. BochkoSchool of Technology and InnovationsUniversity of Vaasa
Vaasa, Finland
https ://orcid. org/0000-0002 -3 5O5 -3 61 1
3'd Sami J. LauronenSchool of Technology and InnovationsUniversity of Vaasa
Vaasa, Finland
https ://orcid.org/0000-0002 -3'7 67 -045X
4th Jarmo T. AlanderSchool of Technology and InnovationsUniversity of Vaasa
Vaasa, Finland
https ://orcid.org/0000-0002-7 I 6 I -808 1
Abstract-l\le propose a fast fixed-point algorithm for bicubicinterpolation on FPGA. Bicubic interpolation algorithms onFPGA are mainly used in image processing systems and basedon floating-point calculation. In these systems, calculations aresynchronized with the frame rate and reduction of computationtime is achieved designing a particular hardware architecture.Our system is intended to work with images or other similarapplications like industrial control systems. The fast and energyefficient calculation is achieved using a fixed-point implementa-tion. We obtained a maximum frequency of 27 .26 MHz, a relativequantization error of 0.367o with the fractional number of bitsbeing 7, logic utilization of 87o, and about 301o of energr savingin comparison with a C-program on the embedded HPS forthe popular Matlab test function Peaks(25,25) data on SoCkitdevelopment kit (Terasic), chip: Cyclone V, 5CSXFC6D6F31C8.The experiments confirm the feasibility of the proposed method.
Index Terms-control, fixed-point algorithm, bicubic interpo-lation, FPGA, energy effrciency
I. INTRoDUcrroN
Interpolation is widely used in different areas of engineeringand science particularly for image generation and analysisin remote sensing, computer graphics, medicine, and digitalterrain modelling [1-4]. The most popular methods in digitalimage scaling are nearest neighbor and bilinear interpolation.However, nearest neighbor interpolation has stairstepping onthe edges of the objects while bilinear interpolation producesblurring [3]. Bicubic interpolation is in turn slightly morecomputationally complicated but has a better image quality.FPGA based real-time super-resolution is introduced in [5]where the FPGA based system reduces motion blur in images.The fisheye lens distortion correction system based on FPGAwith a pipeline architecture is proposed in [6]. The FPGA-based fuzzy logic system is utilized in image scaling [7]. Thearchitecture is based on pipelining and parallel processing tooptimize computation time. A bilinear interpolation method for
This study was supported by the Academy of Finland (projectSA/SICSURFIS). 978-t-7281-2769-9/19/$3 1.00 O20 t9 IEEE
FPGA implementation has been used to improve the qualityof image scaling [8]. For preprocessing pulposes sharpeningand smoothing filters are adopted followed by a bilinearinterpolator. The adaptive image resizing algorithm is verifledin FPGA [9]. The architecture consists of several stage parallelpipelines.Implementations of bicubic interpolation using FPGA forimage scaling [10, 11] usually use floating-point arithmetic.In [ 1], the floating-point multiplication is replaced by a look-up table method and convolution designed using a library ofparameterized modules. These methods deal with a batch ofdata, i.e. all image frame pixels are available concurrently, andthe purpose is to provide real-time video-processing at imageframe rate.Our task is different, as the goal includes also a high-speed industrial control applications, where fast-rate datasequentially arrive from sensors and the interpolated controldata has to be sent to the acfuators within low latency delaythat can only be achieved using FPGA or ASIC. Our controlsystem is similar to the look-up table implementations offuzzy controllers, e.g. [12]. In real-time applications, it iscomputationally efficient to implement the nonlinear controlsurface as a (possibly multi-dimensional) look-up table, whichis obtained by spatial sampling from the continuous controlsurface. The control output samples can use either floating or
fi xed-point representation. Subsequently, the interpolated con-trol outputs between the sample grid points can be computedin runtime.In contrast to the studies presented in [0, 11] we implementthe interpolation algorithm using fixed-point arithmetic. Theobjective is to obtain accurate data quantization working withthe same rate as the data arrives. Obviously, the use of fixed-point numbers introduces round-off errors at several phases:quantization of measurements, sampling, and in internal calcu-lations. The benefit of fixed-point algorithms include reducedcomplexity of the logic and, subsequently, a higher operatingfrequency.
x¡-u-1 f t¡-L f r*t,¡r fi*2,1.¡i
v fì.rJ | ¡,i f i+r,J t i+z,l
fr.r,jrr f ¡,j*r fl+t,¡+t f *z,l+
i+I,j+2 ft+2,j+2it¿,j+z t,l+2
0,0 Àtx
Fig. 1. Notations used in bicubic interpolation. Note the convention of imageprocessing for y-axis towards line.
As for fixed-point implementations there are several com-petitive optimization objectives. On one hand, the quantizationerror should be minimized. On the other hand, the resource
use and latency time should be minimized and the throughputmaximized. One solution is to find a suitable wordlengthto serve all the objectives reasonably well. Additionally, theinternal arithmetic can be implemented smartly: avoiding com-plex arithmetic and using, e.9., additions and shift operationsinstead, and using the potential of VHDL language to definecustom data types with only the required number of bits canresult in significant savings in resources. This makes the fixed-point calculation a demanding problem when implementing inFPGA.Reference [t3] defines two main methods to optimize thewordlength as for fixed-point computations. First, the fixed-point implementation can be compared to the equivalentfloating-point system by simulation. Second, several analytical
approaches can be used. We use the simulation approach.
II. Brcustc INTERPoLATIoN
The objectivc is to interpolate a two-dimensional functionF(*,A) defined on a regular rectangular grid (Fig. 1). Thefunction values are known in the intersection points (fi,¡).The point of interpolation (r,y) is a function value downand to the right of a grid point (fi,¡) with a deviation(Lt*, Ltù from the previous grid points. For interpolating onepoint, 4 x 4: 16 grid points plus the deviations (Lt,,Ltu)
are needed. This is a good example how we can trade betweenspeed and resources with FPGAs: we can either compute the'i,Lt, and j,Lt, in parallel to gain speed or sequentiallyin series to minimize hardware. In any case we can definea hardware module that does it for one dimension (using afixed-point approach). Bicubic spline interpolation requires thesolution of a linear system, described in [4], for each gridcell. An interpolator with similar properties can be obtained
by applying a convolution with the following kernel in bothdimensions:
w(r): (ø+2)lø13-(a+3)lrl2+1 for lzl(1,alrls-5alæ12-tSalrl- a for løl<2, (1)0 otherwise,
where ø is usually set to -0.5 or -0.75. Note that W(0): tand,W(n):0 for all nonzero integers n. Keys, who showedthird-order convergence with respect to the sampling intervalof the original function, proposed this method [4].If we use the matrix notation for the common case
ø : -0.5, we can express the equation as follows:
X
p(¿): ; tl o20-1 0 12-54-1 3 -3 +l h'l ,A¿ a* a¿t]
for Aú € [0, 1) for one dimension. Note that for l-dimensionalcubic convolution inte¡polation requires four sample points.For each inquiry two samples are located to the left and twoto the right from the point ofinterest. These points are indexedfrom -1 to 2 in this paper. The distance from the point indexedwith 0 to the inquiry point is denoted by Aú here.For a point of interest in a 2D grid, interpolation is firstapplied four times in ø and then once in g direction:
b a:p(Lt,, f 1r- r, ¡ - t ¡, f ç,,¡ - ¡, f 6+ r, ¡ - r), Í U+2, i - Ð),bs:p(Lt,, f 1t- t,¡¡, f ç¡7, Í o+t,Ð ¡ f 6+2,ù),b¡:p(Lt",f 1¿-t,i+Ð,f þ,i+g,Íç+r,i+r),f þ+2,¡+Ð), (3)b2:p(Lt ", f 1i - 1, i +z), f þ,i +z), f þ+ t,i +z), f e+z,i +Ð),p(A ,*):p(Lts,b _r ,bo,br ,bz) ,
The size of the data matrix / is denoted by s" x sr. Toenable interpolation also at the edge points we extend thedata to the top and left margins by repeating data from thetop row and the left column, respectively, and to right andbottom margins by repeating twice the right column and thebottom row, respectively. Thus, the size of the extended matrixis sr" x ".y": (", + 3)(so + 3).III. FIXED-PoINT NUMBERS AND ARITHMETIC'We use Q-., numbers to define fhe m integer and nfractional bits for the fixed-point approach. The fractional partdetermines the interpolation and quantization resolutions, i.e.the interval between two consecutive numbers or interpolatedpoints. This is defined as lAt¿-Àú¡ -tl : 2-*.In general, thedata range determines the number of integer bits needed. Inparúcular purposes rn is as follows: The value rn is determinedusing the absolute maximum value of the given data set /.In addition, from (2) we note that Aú < 1, and the absolutevalues of the matrix entries are integers in the range 10,5].Multiplication by 2 and 4 can be replaced by left shifts. Dueto the fact that entries 3 and 5 can be decomposed to (2 + 1)and (4 * 1), respectively, multiplication by 3 and 5 can bereplaced by left shifts and one summation.Finally, we assume that value rn is defined by a numberof bits representing the absolute maximum value of / shifted
Y
HP5(ARM) FPGA
Fig. 2. The HPS-FPGA interaction scheme. The HPS does data preprocessing,testing and reporting. The fixed-point algorithm is implemented in FPGA.
left twice. The given data is positive and negative. Therefore,signed decimal numbers are used and, thus, a sign bit is alsoneeded. The wordlength for f is m + r¿ + 1.The corresponding wordlengths for .x aîd y are rnr I n -f Iand mo I n * 7, where m, aîd rna are the least number ofbits needed to represent the data matrix f size s," ard sse,respectively.
A. Fixed-point Implementation in VHDL
We could use a fixed-point package for modeling [15]. How-eveq this package may not be available for electronic designautomation tools needed for programming design functionalityin FPGA. In addition, bicubic interpolation includes arithmeticoperations avoiding multiplication, division and other time andresource consuming operations, which simplify the design forûxed-point calculations. Therefore, we model the fixed-pointnumbers and arithmetic directly in VHDL.
We use both simulation and a Hard Processor System (HPS-FPGA) scheme in the implementation and testing (Fig. 2).The software of the HPS performs preprocessing of input dataneeded for the fixed-point algorithm. We use þthon programfor preprocessing the data. The original data have floating-point coordinates in the range [-o,o) for r and [-b,b] fory. The HPS translate these values by adding ø * 1 and b *I lo r and y, respectively, to make them positive values inthe range 11, s,] and 11, sr] that are, subsequently, suitablefor separating into integer and fractional parts. In addition,we multiply their values by 2" to convert them to fixed-pointnumbers. After preprocessing, input data (æ,y) are sent to theFPGA. The output of the FPGA is an interpolated value readback to the HPS. The HPS divides the interpolated valuesby 2" fo convert them back the floating-point values. We donot delegate preprocessing to FPGA since the focus of thestudy is on interpolation and the original data are not necessaryfloating-point values.
We implemented the fixed-point algorithm in VHDL for theFPGA. The dataflow for the bicubic interpolation includes: ex-tractor of integer and fractional part, convolution, dot productand output register (in Fig. 3).For VHDL the input is (r, g) (Fig. 3). First, componentBicubic interpolation calculates the integer and fractional partsof the input. The integer part gives indexes (i, j) of matrix/. The matrix / is implemented as a VHDL 2D array in apackage (fixed control surface). The fractional part defines(Lt",Ltò. This information is used to calculate convolutionaccording to (3). We have 4 (b-rbù of 5 convolution oper-ations implementing in parallel. Component Convolution cal-culates the product between the matrix and vector containing/ values of (2) to obtain a weighted composition of valuesf and, then, passes the result to component Dot product to
Clock
lnterpolatedvalue
Fig. 3. The dataflow for bicubic interpolation
calculate the dot product of the weighted composition and thevector containing Aú and its powered values.rilhen the weighted composition is determined, all multipli-cations are replaced by summations and shifting to acceleratethe calculation. The other arithmetical operations are as fol-lows:. VHDL package numeric_std provides summa-tion/subtraction of signed integer numbers [6].. Multiplication/division by a factor 2k, where k : 7,2,,is replaced by a bit shift.. The left shift for the negative and positive numbers wasimplemented keeping the sign bit, shifting all bits to theleft, removing the MSB and adding 0 to the LSB.. The right shift for the positive numbers was implementedkeeping the signed bit, shifting all bits to the right,inserting 0 to the MSB and removing the LSB. Theright shift for the negative numbers was implementedkeeping the signed bit, shifting all bits to tire right,inserting 1 to the MSB and removing the LSB. Thedifference in shifting is because the negative numbershave a complement form.. VHDL package numeric_std provides multiplication ofsigned decimal numbers in component Dot product. Theresult of multiplication if both operands have the sameformat is: two (repeated) sign bits, 2m integer bits, 2nfractional bits. We denote the length of the word withoutthe sign bits with four parts: m'+m" +n'lntt (m' : mtland n' - n").To convert the result to the format of theoperand, one has to keep one (any) sign bit, and m" +n'bits.
We do not use hardware multipliers, because we use variablewordlength. This gives more flexibility to scale up the designfor any number of bits. Shifting is simply by array indices,therefore DSP logic is not needed.
B. Fixed-point Implementation in Matlab
For verification, we implemented floating-point and fixed-point algorithm variants in Matlab. For fixed-point'ffe use thesame Q^,n numbers and the Matlab integer data type with32 bits (int32). The arithmetic operations for the fixed-pointalgorithm are as follows:. Matlab supports summation and subtraction of the integernumbers.
Convolutlon DotproductExtractor of Intergerand fractional part,output result
Register
Bicubic
Package for global declarãtions (Types.vhd)
Top-level VHDf (SystemOnchip.vhd)
oc
(9ÀLol
== ot!:¡çu
o'aESEã3 bãgYi,Ë gI A.=!¿<l!5 o¿< õEõ: So zið>:6tr<O=t
ARMprocessor Þ-ìãÉ,{,o::
Custom FPGAlogic:(concurrentass¡gnements,processes,componentinstânces, etc)
QSYS hard processor system(SoC-QSYS.qsys)
Fig. 4. Data flow between ARM and FPGA. Notations: Avalon MemoryMapped Slave (AMMS), System on Chip (SoC), and System Integration Tool(QSYS).
. Multiplication of variables by factors or variables 'üasmade by converting the decir,nal numbers to the integer64-bit format and then the result was multipliedby 2n,respectively, and converted back to the 32-bit format.. Matlab provides division by a factor of 2.
IV. SyNTgpSIS USING HPS AND FPGA
For synthesis we use the TerasicAltera SoCKit develop-ment board combining HPS (800 MHz, A Dual-Core ARMCortexru - A9 MPCoTeTM Processor) and FPGA (CycloneY 5CSXFC6D6F31C6). This Section includes the descriptionof the interface between HPS and FPGA, method to establish acommunication between HPS and t PGA, and the C languageprogram to access FPGA.
A. Interface between HPS and FPGA
The interface establishes a communication between ARMand FPGA. The dataflow diagram of the interface is given inFrg. 4. The interface consists of: the ARM processor (HPS),where software code is written, compiled, and run, AvalonMemory Mapped Slave (AMMS) interfaces from IIPS toFPGA and FPGA to HPS. Avalon buses are Intel's denitionsfor a few general purpose buses. In this study, they are usedto synchronously transfer data from HPS to FPGA and fromFPGA to HPS. As both buses are slave buses, it implies thatHPS is the master, i.e., data is transferred only when thesoftware-side requests so.The ARM processor and the AMMS buses are instanti-ated and integrated in QSYS (Intel). Inside QSYS systems,Avalon buses are usually used in communication. Intel alsoprovides the possibility to use arbitrary buses. These are calledconduits, which may be useful in communication between a
QSYS system and custom FPGA logic that does not supportAvalon buses. As the custom FPGA logic, our fixed-point
bicubic interpolation parallel arithmetic operations with signedintegers are implemented. The top-level entity includes: portsto the outside of the SoC (System on Chip) chip, an instanceof the QSYS system, and possible instances of the customFPCA logic components. To make the codc morc rcadablc andthe integration and parametrization of different parts simpler,a VHDL package to define custom global signal types andconstants is also declared.
B. Access to FPGA
From HPS, the Avalon buses are seen as memory-mappedIOs. For this low-level memory access a program written inC is used. Its purpose is to write the ø and g coordinatcsto two memory addresses of the lightweight bridge, and then
read the result from another address. The read function can becalled immediately after calling the write function, because theFPGA calculates the result with a time, which is less than thedelay between the two function calls. Before using the writeand read functions of the program, the initialization functionmaps the memory addresses of the lightweight bridge into theprocess memory, so that these addresses can be used later.
V. EXPERIMENTS
We conducted experiments to study the quantization er-ror, complexity, speed and power/energy consumption of theproposed algorithm. We implemented the floating-point andfixed-point algorithms in Matlab and fixed-point algorithm inVHDL. The floating point algorithm (Matlab) was used for theanalysis of fixed-point finite wordlength errors in Matlab andFPGA. For simplicity, we will call finite wordlength errors
caused by quantization of signals, roundoff errors occurringat arithmetic operations and quantization of constants as aquantization error.
A. htpul ¿latu uruJ wordlengtlt
For testing we choose a well-known Matlab data generatedby the function Peaks(25,25) ll7l. The function generates amixture of 2-D Gaussians. The data matrix size is 25 x 25.Thus the range of r and y is 11,25] and translation is notneeded. The original Peaks(25,25) values are multiplied by30. This gives a data range l-189.79,239.89].According to our generalized wordlength representation(Section 3) we suppose to work with signed Q1s.7 numbers for
.f ç,i¡ and unsigned Q5,7 for r andy. Given the Q-,r, numbersMatlab automatically generates a VHDL package containingthe constants determining the several wordlengths used in thefixed-point calculations. The HPS-FPGA scheme is used forcalculation (Fig. 2). The input data represents coordinates rand gr. The HPS multiplies these values by 27 for the ñxed-point calculation. Finally, the HPS divides the interpolatedvalue by 27.
B. Matlab Test
First, we implemented a floating-point algorithm in Matlab.To test it we generated a 3D surface using the given matrix/ (function Peaks(25,25) data) for interpolating and, then,
Fig. 5. a) Floating-point interpolation using Matlab. The circle with a radius5 and center at (14,14) is projected onto the surface interpolating the inputdata (black cuwe). b) The mean absolute error (logarithmic scale) vs. thenumber of fractional bits n. The vertical error bars scaled by a factor of 4 forvisualization show the confidence interval at level 0.95.
synthesized the projected circle with a radius 5, center located
aÍ. (14,14). One can see the interpolation rcsults in Fig. 5a.Before FPGA implementation we tested the quantizationerror depending on the number of fractional bits n, at aconfidence interval (CI) of 0.95 (Fig. 5b). Figure 5b showsthat a reasonable choice for the number of bits is 7 that givesa relatively small quantitative error (mean absolute error of0.044 at 957o CI!O.OO14 0.0731).
C. FPGA Test
The quantization error was calculated for 10,000 uniformlydistributed random points. One set of interpolated pointswas determined using the floating-point Matlab algorithm.The other set of interpolated points was determined usingthe fixed-point algorithm on FPGA. Four quantization errormetrics were used in comparisons: maximum absolute error(MAXAE), mean absolute error (MEANAE), median absoluteenor (MEDIANAE), and standard deviation (STD) at n:7(Tab. I). The relative error defined as the ratio of the maximumabsolute error and the maximum absolute value of signal is
O.36Vo atn:7.
TABLE IFouR QuANTlzATroN ERRoR METRrcs
MAXAE MEANAE MEDIANAE STD0.87 0.08 0.03 0. l3
The quantization error surface is shown in Fig. 6a. One can
see that the quantization error is nonuniformly distributed upon
the interpolated surface. To understand the error behavior wecalculated the numerical gradient over the interpolated surface(Fig. 6b). Two plots (Fig. 6b, 6c) indicate that the quantizationerror increases with the increasing gradient.Then, we calculated the gradient magnitude and meanabsolute error over the interpolated surface (Fig. 6c). Themean absolute error for the data in each cell of the grid wascalculated. The gradient magnitude is as follows:
G_ UÐ, + UÐ,, (4)
where f t, and f I are numerical derivatives for r and y coor-dinates. It is clear that there is a reasonable linear dependencebetween the mean absolute error and gradient magnitude. The
Pearson correlation coefficient is 0.42 that indicates a moderatepositive relationship between mean absolute error and gradientmagnitude. In addition, we measured the correlation coefficientfor the slowly varying industrial application data set. Thevalue measured was 0.8, i.e. a strong correlation. This is inaccordance with the nature of bicubic interpolation, which wellsuits for smoothed data.Timing analysis was implemented using TimeQuest TimingAnalyzer (Intel). The solution was analyzed for delays in thedigital circuit. To find the maximum clock frequency, the multicorner mode was utilized. The obtained result for bicubicinterpolation is F*o, : 27.26 MHz.To estimate the complexity and logic utilization of thesolution compilations with several system parameters weremade (Tab. II). In this experiment, we varied n the numberof bits in the fractional part of Q*,n and monitored logicutilization, number of registers and DSP blocks. The resultsshow the increase number of logic initialization and totalregisters with the increase of fractional bits while the numberof DSP blocks are not changed.
TABLE IICoMPARISON WITH VARIED SYSTEM PARAMETERS. THE NUMBER oF DSPBLocKs ts 25 (22Vo) FoR ALL cAsEs.
n bits oÍ Q^,n n4 n=5 n=7 n=9 n=II n=13Ingic 2,528 2,952 3,356 3,799 4,144 4,545initializøtinn 6Vo 'l%o 8o/o 97o l0% ll%o
Finally, we measured power and energy consumption withand without FPGA accelerator using the same SoC board (Fig.7). For calculation, we utilized the same 10,000 uniformlydistributed random points used in the quantization test. Themeasurements were made using the oscilloscope Agilent DSO-x 4024A (Tab. IIÐ.Tests with the C-program running in HPS and the acceler-ated program using HPS-FPGA were run eight times each. rùy'emeasured the static and dynamic parameters. Table III showsthat the static power of HPS is higher than HPS-FPGA eventhough that depends on a number of active logical elements.The average dynamic power with the HPS only configurationis lower than with FPGA accelerator (0.28 W against 0.34 W).However, the computational time with HPS-FPGA is shorter
m0
200
100
- 100
-æ0s
Floating-point algorithm
79n [bits]
252ø1510 l0
(a)
g 10"
o)o::õo(ú^cru(û(t
3 11 13
(b)
0-t
-0.5
TABLE IIIPoWER (P) AND ENERGY (E) FoR HPS (C-PRoGRAM) AND HPS-FPGAUsrNc rHE S,cvr SOC BoARD FoR EIGHT MEASUREMENTS. THE rNDExH STANDS FoR HPS AND F STANDS FoR HPS-FPGA.
Parameter, rms Average value andconfidence interval
dv)-l25
a
d
A
15
10
10
Pn,WPr,W 5.7,957o CtÍ5.7,5.715.46, 957o C115.46, 5.461
m
15 Ps,WPp,WEn, JEp, J
Êtoo"d,rVo
O.28, 95Vo CIt0.259, 0.3011
O.34, 95o/o CI[0.32, 0.36]
o.19, 957o CI[0.169, 0.21 1]
0.13, 957o CI[O.123, 0.1371
3t.57
æ
(a)
&
70
æ
100
s
æ
6
m
0
30
(a)
0.5
Lob 040)l 0.3õo€ 0.,cftt^-oul
6.6
6.4
>,6.2
o;oo- bs
10 5.8
5.6
5.4
10(b) o0
6.6
6.4
=62
1.5 2ïime, s 2.5 3
I
10.5 a)
0
(!)IÀ 65.8
5.b
5.4
0 20 40 60 80Gradient magnitude 100(c)
Fig. 6. a) Quantization enor surfacc. b) Thc gradient over the interpolatedsurface. The highest values of gradient are shown by white color. c) Meanabsolute enor vs. gradient magnitude showing a moderate strength of rela-tionship.
(in average 59Vo of C-program time) and as a result, the totalenergy consumption is lower (31.577o less). 'We note that fixedcosts due to reading and writing files and preprocessing thedata reduce the total percentage saving of execution time andenergy consumption.
VI. Coxcr-usloNs
In this paper, rwe proposed a hardware implementation ofan accurate fixed-point bicubic interpolation intended for anindustrial control system. The general recommendation for thewordlength selection depending on the input data format weregiven. In the experiments, we used signed Q1s,7 numbersfor the interpolated values and unsigned Q5,7 numbers forthe input values. These values can be changed because theconstants depending on these wordlength values are auto-matically calculated in Matlab for the VHDL package. Thechosen Q-,, numbers for the input and output gave the
(b)
Fig. 7. Power oscillogram for HPS (a) and HPS-FPGA (b) (one measurement).The static power for HPS-FPGA is lower while the dynamic power is higherthan for HPS. The HPS-FPGA computational time is shorter than HPS andas a result, the energy consumption is lower (31.571o less). The time discreteis 25 ms and the measurement time interval is 2 s.
relative quantization effor of 0.367o and achieved 27.26 iÙlHzfrequency for function Peaks(25,25). The HPS-FPGA energyçonsumption was about 3lVo lower than when using a C-program only running in the same chip. The HPS-FPGA staticpower was 4.2Vo lower than when using the C-program.In the future, we plan to implement fixed-point bicubicinterpolation for images.
Acnlowlr,ocMENT
We thank Markku Suistala from the Vaasa University ofApplied Sciences, Finland, for the help in the FPGA energy
measurements.
1't 11,5Time. s
'¡
REFERENCES
[1] J. F.Hughes, A. Van Dam, J. D. Foley , M. McGuire, S.K. Feiner, and D. F. Sklar, Computer Graphics: Principlesand Practice, Pearson Education, 2014.[2] J. Garnero and D. Godone, "Comparisons between dif-ferent interpolation techniques," The Role of Geomaticsin Hydrogeological Risk, Padua, Italy, The InternationalArchives of the Photogrammetry, Remote Sensing andSpatial Information Sciences, vol. XL-5/1V3, Feb. 2013,pp. 139-144.t3l C. C. Lin, M. H. Sheu, H. K. Chiang, Z. C. Wu,J. Y. Tu, and C. H. Chen, 'A low-cost VLSI designof extended linear interpolation for real time digitalimage processing," In 2008 International Conference onEmbedded Software and Systems, July 2008, pp. 196-202.[4] T. M. Lehmann, C. Gonner, and K. Spitzeç "Survey: In-terpolation methods in medical image processing," IEEETransactions on Medical Imaging, vol. 18, November1999, pp. tO49-75.[5] M.E. Angelopoulou, C. S. Bouganis, P.Y. Cheung, and' G. A. Constantinides, "FPGA-based real-time super-resolution on an adaptive image sensor," In Interna-tional rüy'orkshop on Applied Reconfigurable Computing,Springe¡ Berlin, Heidelberg, March 2008, pp. 125-136.t6l N. Bellas, S. M. Chai, M. Dwyer, and D. Linzmeier,"Real-time fisheye lens distortion correction using au-tomatically generated streaming accelerators," In 200917th IEEE Symposium on Field Programmable CustomComputing Machines, April 2009, pp. 149-156.[7] A. Amanatiadis, I. Andreadis, and K. Konstantinidis,"Design and implementation of a fuzzy area-basedimage-scaling technique," IEEE Transactions on Instru-mentation and Measurement, August 2008, vol. 57,pp.1504-1513.t8l N. Vidyashree and S. Usharani, "Implementation of im-age scalar based on bilinear interpolation using FPGA,"IJARECE, June 2015, vol. 4, pp. 1620-1624.
[9] J. Xiao, X. Zou, Z. Liu, and X. Guo, 'Adaptive in-terpolation algorithm for real-time image resizing," InFirst International Conference on Innovative Computing,Information and Control, Aug. 2006, vol. 2, pp. 221-224.[0] M. A. Nuno-Maganda and M. O. Arias-Estrada. "Real-time FPGA-based architecture for bicubic interpolation:an application for digital image scaling," In 2005 Inter-national Conference on Reconfigurable Computing andFPGAs, Sep. 2005, pp. 8-pp.[11] Y. Zhang, Y. Li, J. Zhen, J. Li, and R. Xie, "The hard-ware realization of the bicubic interpolation enlargementalgorithm based on FPGA," In 2010 Third InternationalSymposium on Information Processing, Oct. 2010, pp.277-281.[l'2] J. Jantzen, "Tuning of fizzy PID controllers," TechnicalUniversity of Denmark, report. 1998.[3] R. Cmar, L. Rijnders, P. Schaumont, S. Vernalde, andI. Bolsens, 'A methodology and design environmentfor DSP ASIC fixed point refinement," In Design, Au-tomation and Test in Europe Conference and Exhibition,Proceedings (Cat. No. PR00078), 1999, pp. 271-276.[14] R. Keys, "Cubic convolution interpolation for digi-tal image processing," IEEE Transactions on Acous-tics, Speech, and Signal Processing, 1981, Vol. 29(6),pp.l 153-1 160.[15] D. Bishop, "Fixed point package users guide," Packagesand bodies for the IEEE, 2010, pp. 1076-2008.
I I 6] Doulos: https://www.doulos.com./knowhow/vhdl_designers_guide/numeric_std,/, Last access:14.05.20t9.
I I 7] Math]Vorks : https ://se. mathworks.com/help/matl ab / ref /peaks.html, Last access: 22.05.2019.
