On Board Georeferencing Using FPGA-Based Optimized Second Order Polynomial Equation by Liu, Dequan et al.
Old Dominion University
ODU Digital Commons
Electrical & Computer Engineering Faculty
Publications Electrical & Computer Engineering
2019
On Board Georeferencing Using FPGA-Based
Optimized Second Order Polynomial Equation
Dequan Liu
Guoqing Zhou
Jingjin Huang
Rongting Zhang
Lei Shu
See next page for additional authors
Follow this and additional works at: https://digitalcommons.odu.edu/ece_fac_pubs
Part of the Systems and Communications Commons
This Article is brought to you for free and open access by the Electrical & Computer Engineering at ODU Digital Commons. It has been accepted for
inclusion in Electrical & Computer Engineering Faculty Publications by an authorized administrator of ODU Digital Commons. For more information,
please contact digitalcommons@odu.edu.
Repository Citation
Liu, Dequan; Zhou, Guoqing; Huang, Jingjin; Zhang, Rongting; Shu, Lei; Zhou, Xiang; and Xin, Chun Sheng, "On Board
Georeferencing Using FPGA-Based Optimized Second Order Polynomial Equation" (2019). Electrical & Computer Engineering Faculty
Publications. 205.
https://digitalcommons.odu.edu/ece_fac_pubs/205
Original Publication Citation
Liu, D., Zhou, G., Huang, J., Zhang, R., Shu, L., Zhou, X., & Xin, C. S. (2019). On-board georeferencing using FPGA-based optimized
second-order polynomial equation. Remote Sensing, 11(124), 1-28. doi:10.3390/rs11020124
Authors
Dequan Liu, Guoqing Zhou, Jingjin Huang, Rongting Zhang, Lei Shu, Xiang Zhou, and Chun Sheng Xin
This article is available at ODU Digital Commons: https://digitalcommons.odu.edu/ece_fac_pubs/205
remote sensing  
Article
On-Board Georeferencing Using FPGA-Based
Optimized Second-Order Polynomial Equation
Dequan Liu 1,3, Guoqing Zhou 1,2,3,4,*, Jingjin Huang 2,3, Rongting Zhang 2,3, Lei Shu 2,3,
Xiang Zhou 1,4 and Chun Sheng Xin 5
1 School of Microelectronics, Tianjin University, Tianjin 300072, China; 1017232001@tju.edu.cn (D.L.);
zqx0711@tju.edu.cn (X.Z.)
2 School of Precision Instrument & Opto-Electronics Engineering, Tianjin University, Tianjin 300072, China;
jingjin_huang@tju.edu.cn (J.H.); zrt65@tju.edu.cn (R.Z.); shulei@tju.edu.cn (L.S.)
3 The Center for Remote Sensing, Tianjin University, Tianjin 300072, China
4 GuangXi Key Laboratory for Spatial Information and Geomatics, Guilin University of Technology,
Guilin 541004, China
5 Department of Electrical and Computer Engineering, Old Dominion University, Norfolk, VA 23529, USA;
cxin@odu.edu
* Correspondence: gzhou@glut.edu.cn; Tel.: +86-773-589-6073
Received: 27 November 2018; Accepted: 3 January 2019; Published: 10 January 2019


Abstract: For real-time monitoring of natural disasters, such as fire, volcano, flood,
landslide, and coastal inundation, highly-accurate georeferenced remotely sensed imagery is
needed. Georeferenced imagery can be fused with geographic spatial data sets to provide
geographic coordinates and positing for regions of interest. This paper proposes an on-board
georeferencing method for remotely sensed imagery, which contains five modules: input data,
coordinate transformation, bilinear interpolation, and output data. The experimental results
demonstrate multiple benefits of the proposed method: (1) the computation speed using the proposed
algorithm is 8 times faster than that using PC computer; (2) the resources of the field programmable
gate array (FPGA) can meet the requirements of design. In the coordinate transformation scheme,
250,656 LUTs, 499,268 registers, and 388 DSP48s are used. Furthermore, 27,218 LUTs, 45,823 registers,
456 RAM/FIFO, and 267 DSP48s are used in the bilinear interpolation module; (3) the values of root
mean square errors (RMSEs) are less than one pixel, and the other statistics, such as maximum error,
minimum error, and mean error are less than one pixel; (4) the gray values of the georeferenced image
when implemented using FPGA have the same accuracy as those implemented using MATLAB
and Visual studio (C++), and have a very close accuracy implemented using ENVI software;
and (5) the on-chip power consumption is 0.659 W. Therefore, it can be concluded that the proposed
georeferencing method implemented using FPGA with second-order polynomial model and bilinear
interpolation algorithm can achieve real-time geographic referencing for remotely sensed imagery.
Keywords: georeferencing; second-order polynomial equation; bilinear interpolation; the field
programmable gate array (FPGA)
1. Introduction
With the advancement of technological, remote sensing (RS) images are becoming more widely
used in natural disasters monitoring and positioning [1–3]. They are required to rapidly produce highly
accurate georeferenced image. However, the processing speed of the traditional methods for indirect
georeferencing of RS images cannot to meet the timeliness requirements [4–9]. Therefore, it would be
highly desirable to develop a rapid, inexpensive technique to implement an on-board georeferencing.
Remote Sens. 2019, 11, 124; doi:10.3390/rs11020124 www.mdpi.com/journal/remotesensing
Remote Sens. 2019, 11, 124 2 of 28
This paper proposes a georeferencing method based onfield programmable gate arrays (FPGAs)
with the optimized second-order polynomial equation and bilinear interpolation scheme.
Georeferencing is a key step in geometric correction which aims to establish the relationship
between image coordinates and ground coordinates through various function, such as collinearity
equation models (CEM), polynomial function models (PFM), rational function models (RFM),
and direct linear transformation models (DLT) [10–15]. However, these traditional algorithms were
developed for serial instruction systems based on personal computers (PC). As such, these systems
hardly meet response time demands of time-critical disasters [5].
To solve the limitation of the speed of RS images processing, many researchers proposed
effective alternative processing methods. High-performance computing (HPC) was widely used
in RS data processing system. Cluster computing has already offered access to greatly increased
computational power at a low cost in a few hyperspectral imaging applications [16–18].
Although the algorithms of RS data processing generally map quite nicely to multi-processor systems
composed of clusters or networks of CPUs, these systems are usually expensive and difficult to
adapt to the on-board RS data processing scenarios. For these reasons, the specialized integrated
hardware devices of low weight and low power consumption are essential to reduce mission payload
and obtain analysis results in real-time. FPGAs and graphics processing units (GPUs) exhibit good
potential to allow for on-board real-time analysis of RS data. Fang et al. [19] presented near
real-time approach for CPU/GPU based preprocessing of ZY-3 satellite images. Van et al. [20]
proposed a new method to generate undistorted images by implementing the required distortion
correction algorithm on a commercial GPU. Thomas et al. [21] described the georeferencing of
an airborne hyperspectral imaging system based on pushbroom scanning. Reguera et al. [22]
presented a method for real-time geo-correction of images from airborne pushbroom sensors using
the hardware acceleration and parallel computing characteristics of modern GPU. López-Fandiño
et al. [23] proposed a parallel function with GPU to accelerate the extreme learning machine (ELM)
algorithm which is performed on RS data for land cover applications and achieved competitive
accuracy results. Lu et al. [24] mapped the fusion method of RS images to GPU. The result shown
that the image fusion speed based on GPU was much quicker than that based on CPU when
the image size was getting bigger. Now, the use of GPUs to accelerate RS processing has resulted
in further related research achievements. However, most parallel processing methods based on
GPU multitasking are not alone capable of surmounting the shortcomings of serial instrument
methods [25,26]. Additionally, FPGAs exhibit lower power dissipation figures than GPUs [27].
FPGAs have been consolidated as the standard choice for on-board RS data processing due to
their smaller size, weight, and power consumption when compared to other HPC systems [27–32].
Zhou et al. [28] presented the concept of the “on-board processing system”. Huang et al. [29]
proposed an FPGA architecture that consisted of corner detection, corner matching, outlier rejection,
and sub-pixel precision localization. Pakartipangi et al. [30] analyzed the camera array on-board
data handling using an FPGA for nano-satellite application. Huang et al. [31] proposed a new
FPGA architecture that considered the reuse of sub-image data. Yu et al. [32] presented a new on-board
image compression system architecture for future disaster monitoring by low-earth-orbit (LEO)
satellites. Zhou et al. [5] proposed an on-board ortho-rectification for remote sensing images based on
an FPGA. Qi et al. [7] presented an FPGA and digital singnal processor (DSP) co-processing system
for an optical RS images preprocessing algorithm. Long et al. [33] focused on an automatic matching
technique for the specific task of georeferencing RS images and presented a technical frame to match
large RS images efficiently using the prior geometric information of the images. Williams et al. [34]
discussed the design and implementation of a real-time cloud detection system for on-board RS
platform. González et al. [35] presented an N-finder (N-FINDR) algorithm implementation using
FPGA for hyperspectral image analysis.
This paper presents an on-board georeferencing scheme using FPGA for remote sensing images.
By decomposing the georeferencing algorithm, the proposed georeferencing platform integrates
Remote Sens. 2019, 11, 124 3 of 28
three modules: a data memory, coordinate transformation (including the transformation from geodetic
coordinates to the raw image coordinates and the raw image coordinates to the scanning coordinates),
and bilinear interpolation. The contributions are summarized as follows.
(1) An optimized second-order polynomial equation and bilinear interpolation are proposed
for on-board georeferencing for remotely sensing imagery. (2) An architecture for floating-point
block Lower-Upper (LU) decomposition is designed to solve the inverse for large-sized matrices.
(3) To reduce resource consumption, some strategies are adopted in FPGA implementation,
i.e., 32-bit integer and floating-point mixed operation and serial-parallel data communication.
The remainder of the paper is organized as follows. Section 2 reviews the traditional second-order
polynomial georeferencing algorithm, bilinear interpolation method and optimizes the algorithms
based on the FPGA; Section 3 gives the details of the implementation using FPGA for the georeferencing
scheme; Section 4 describes and validates the experimental results. The conclusion is presented
in Section 5.
2. Optimization for Georeferencing Scheme
The georeferencing scheme includes the selection of a suitable mathematical distortion model,
coordinate transformation, and resampling (interpolation) [36–38]. The georeferencing scheme can be
classified into direct and indirect methods. The direct method applies the raw image coordinates to
compute the georeferenced coordinates, and the indirect method applies the georeferenced image
coordinates to compute the raw image coordinates. In this paper, the indirect method is used to
establish the relationship of coordinate by the second-order polynomial equations.
2.1. A Brief Review of Georeferencing
2.1.1. Traditional Second-Order Polynomial Equations
The second-order polynomial equations are expressed as follows [39–42]:
x = a0 + a1X+ a2Y+ a3X2 + a4XY+ a5Y2, (1)
y = b0 + b1X+ b2Y+ b3X2 + b4XY+ b5Y2, (2)
where (x, y) are the coordinates of the raw image, (X,Y) are the corresponding ground coordinates
(longitude, latitude), and ai and bi (i = 0, 1, . . . , 5) are the unknown coefficients of the second-order
polynomial equation. After choosing a suitable mathematical distortion model, the unknown
coefficients ai and bi can be obtained from Equations (1) and (2) with the ground control points (GCPs).
GCPs are an important parameter in the geometric calibration, which affect the accuracy of subsequent
correction [43]. To ensure accuracy, ten GCPs are selected in the study area. Let the coordinate pairs
of ten GCPs can be represented as: (x1, y1,X1,Y1), (x2, y2,X2,Y2), . . . , (x10, y10,X10,Y10). The matrix
form of Equation (1) can be expressed as:
1 X1 Y1 X21 X1Y1 Y
2
1
1 X2 Y2 X22 X2Y2 Y
2
2
1 X3 Y3 X23 X3Y3 Y
2
3
1 X4 Y4 X24 X4Y4 Y
2
4
...
...
...
...
...
...
1 X10 Y10 X210 X10Y10 Y
2
10

×

a0
a1
a2
a3
a4
a5

=

x1
x2
x3
x4
...
x10

. (3)
According to the least square method [5], Equation (3) can be simplified as Equation (4).
(ATPA)∆a = ATPLx, (4)
Remote Sens. 2019, 11, 124 4 of 28
where A is the matrix of the ground coordinates of GCPs, ∆a is the coefficients matrix of
the second-order polynomial Equation (1), Lx is the coordinates matrix of GCPs in the raw image,
and P is the weight matrix. A , ∆a, Lx and P are expressed by Equation (5) through Equation (8).
A =

1 X1 Y1 X21 X1Y1 Y
2
1
1 X2 Y2 X22 X2Y2 Y
2
2
1 X3 Y3 X23 X3Y3 Y
2
3
1 X4 Y4 X24 X4Y4 Y
2
4
...
...
...
...
...
...
1 X10 Y10 X210 X10Y10 Y
2
10

, (5)
∆a = [ a0 a1 a2 a3 a4 a5 ]
T
, (6)
Lx =
[
x0 x1 x2 x3 x4 x5
]T
, (7)
P =

P1
P2
P3
P4
...
P10

(8)
Typically, the weight of each GCPs is the same. So, P = I. Equation (4) can be descripted as:
∆a = (ATA)
−1
(ATLx). (9)
In the same way, the coefficients of b can be solved by the formula ∆b = (ATA)
−1
(ATLy).
2.1.2. Coordinate Transformation
After establishing the second-order polynomial equation and solving the coefficients of
Equations (1) and (2), the raw image can be georeferenced pixel-to-pixel. The steps are as follows [44]:
1. The Size of the Georeferenced Image
To properly obtain the georeferenced image, the storage area of georeferenced image must be
computed in advance, as shown in Figure 1 [7]. abcd is the raw image, with corners a , b, c, and d in
the o− xy image coordinate system in the Figure 1a. Figure 1b shows the correct range of output image.
a′b′c′d′ is the georeferenced image with corners a′, b′, c′ and d′ in the O − XY ground coordinate
system, and ABCD is the storage area for georeferenced image. Obviously, if the storage range is not
correctly defined, the georeferenced image is improper with a large blank area, as shown in Figure 1c.
Therefore, the right boundary should include the entire georeferenced image with minimum exterior
blank rectangle as possible.
Remote Sens. 2019, 11, 124 5 of 28
Remote Sens. 2019, 11, x FOR PEER REVIEW 4 of 28 
 
2 2
1 1 1 1 1 1
2 2
2 2 2 2 2 2
2 2
3 3 3 3 3 3
2 2
4 4 4 4 4 4
2 2
10 10 10 10 10 10
1
1
1
1
1
X Y X X Y Y
X Y X X Y Y
X Y X X Y Y
A
X Y X X Y Y
X Y X X Y Y
    
=       
     
, (5)
0 1 2 3 4 5[ ]Ta a a a a a aΔ = , (6)
0 1 2 3 4 5
T
xL x x x x x x=    , (7)
1
2
3
4
10
P
P
P
P
P
P
    
=       

 (8)
Typically, the weight of each GCPs is the same. So, P I= . Equation (4) can be descripted as: 
1( ) ( )T Ta xA A A L−Δ = . (9)
In the same way, the coefficients of b can be solved by the formula 1( ) ( )T Tb yA A A L−Δ = . 
2.1.2. Coordinate Transformation 
After establishing the second-order polynomial equation and solving the coefficients of 
Equations (1) and (2), the raw image can be georeferenced pixel-to-pixel. The steps are as follows [44]: 
1. The Size of the Georeferenced Image  
To properly obtain the georeferenced image, the storage area of georeferenced image must be 
computed in advance, as shown in Figure 1 [7]. abcd is the raw image, with corners a , b , c , and d in 
the o xy− image coordinate system in the Figure 1a. Figure 1b shows the correct range of output 
image. ' ' ' 'a b c d is the georeferenced image with corners 'a , 'b , 'c and 'd in the O XY− ground 
coordinate system, and ABCD is the storage area for georeferenced image. Obviously, if the storage 
range is not correctly defined, the georeferenced image is improper with a large blank area, as shown 
in Figure 1c. Therefore, the right boundary should include the entire georeferenced image with 
minimum exterior blank rectangle as possible. 
(a) (b) 
 
(c) 
Figure 1. The georeferenced image size. (a) Input image. (b) Output image. (c) Incorrect output
image. ur:, upper right; ul:, upper left; ll:, lower left; lr:, lower right. abcd is the input image. ABCD is
the storage area for the georeferenced image. a′b′c′d′ is the georeferenced image.
2. Coordinates of the Four Corners
(xul , yul), (xur, yur), (xlr, ylr), and (xll , yll) are the coordinates of the four corner in the raw image
(input image), (Xul ,Yul), (Xur,Yur), (Xlr,Ylr), and (Xll ,Yll) are the corresponding ground coordinates
of the georeferenced image. The maximum and minimum ground coordinates of the storage area for
the georeferenced image (Xmin, Xmax, Ymin, and Ymax) are calculated from the (Xul ,Yul), (Xur,Yur),
(Xlr,Ylr), and (Xll ,Yll).
Xmin = min(Xur,Xul ,Xlr,Xll)
Xmax = max(Xur,Xul ,Xlr,Xll)
Ymin = min(Yur,Yul ,Ylr,Yll)
Ymax = max(Yur,Yul ,Ylr,Yll).
(10)
3. Row (M) and Column (N) of the Georeferenced Image.
M and M can be solved by Equation (11).
M = (Xmax − Xmin)/XGSD + 1
N = (Ymax −Ymin)/YGSD + 1, (11)
where XGSD and YGSD are the ground-sampling distances (GSD) of the output image.
Hence, the location of pixel can be described by the row and column in the B − x′y′ coordinate
system (x′ = 1, 2, 3, . . . , M; y′ = 1, 2, 3, . . . , N).
4. Coordinate Transformation.
The georeferenced model only expresses the relationship between the ground coordinates (X,Y)
and the (x, y) coordinates of the raw image. To further express the relationship between the raw image
coordinates and scanning coordinates of the georeferenced image, it is necessary to convert the ground
coordinate into the row and column of georeferenced image, as shown in Equation (12).
Xg = Xmin + XGSD(x′ − 1)
Yg = Ymax −YGSD(y′ − 1), (12)
where x′ and y′ are the row and column of the georeferenced image, respectively, and (Xg,Yg)
are the corresponding ground coordinate.
0,---------------
b (x,,,y,,,) (x,,,y,.,) , y 
0 
~--'---'--_JL___j d 
X (x11,yu) (x,,,y,,) 
(X .. ,,.,,Y,.= ) 
A y' y 
i' a 
c' 
j 
Y1 •• • •··•··• ••·· ,r ••••• ) 
' D 
oL--'!::'x~'._~_,._Y_""_> __ M ___ _Jl_(x_ .. _M~;_ .._.•l 
x, X, 
A 
.................. 
a' 
c' 
.............. : D 
o,~ _________ ___::X~ 
Remote Sens. 2019, 11, 124 6 of 28
2.1.3. Resample
As to be expected, the defined grid center from the georeferenced image will not usually project
to the center location of pixel in the raw image after coordinate transformation. To solve this problem,
nearest-neighbor interpolation, bilinear interpolation, and cubic interpolation [45] are commonly
used. To balance the accuracy and complexity of the preprocessing function, bilinear interpolation
is chosen in this paper. The algorithm obtains the pixel value by taking a weighted sum of the pixel
values of the four nearest neighbors surrounding the calculated location [46], as shown in Figure 2
and Equation (13).
I(x, y) = I(i, j)(1− u)(1− v) + I(i+ 1, j)u(1− v) + I(i, j+ 1)(1− u)v+ I(i+ 1, j+ 1)uv, (13)
where, (x, y) are the coordinates of the georeferenced image, I(x, y) is the gray value of a pixel
with (x, y) coordinates in the georeferenced image; i and j are the integer part of the corresponding
coordinate in the raw image, respectively; u and v are the fractional parts of the corresponding
coordinate in the raw image, respectively. I(i, j) is the gray value of (i, j) coordinates in the raw image.
Remote Sens. 2019, 11, x FOR PEER REVIEW 6 of 28 
 
coordinate i  the raw image, respectively; u and v are the fractional parts of the corresponding 
coordinate in the raw image , respectively. ( , )I i j is the gray value of ( , )i j coordinates in the raw image. 
 
(a) 
 
(b) 
Figure 2. The architecture for the bilinear interpolation. (a) Distributions of the four nearest neighbor 
pixels and (b) 3D spatial distribution of the bilinear interpolation. 
2.2. Optimized Georeferencing Scheme 
2.2.1. Optimized Second-Order Polynomial Model  
To parallel implement the Equations (1) and (2), the second-order polynomial equations are 
optimized to Equations (14) and (15). 
2 2
0 1 2 3 4 5 0 1 3 2 4 5
0 1 2
( ) ( )
   = ( ) ( , )
x a a X a Y a X a XY a Y a a a X X a a X a Y Y
a p X p X Y
= + + + + + = + + + + +
+ +
 (14)
2 2
0 1 2 3 4 5 0 1 3 2 4 5
0 1 2
( ) ( )
   = ( ) ( , )
y b b X b Y b X b XY b Y b b b X X b b X b Y Y
b q X q X Y
= + + + + + = + + + + +
+ +
 (15)
where, 1 1 3( ) ( )p X a a X X= + , 2 2 4 5( , ) ( )p X Y a a X a Y Y= + + , 1 1 3( ) ( )q X b b X X= + , and 
2 2 4 5( , ) ( )q X Y b b X b Y Y= + + . 
In Equation (1), eight multipliers and five adders are required to complete a transformation. 
However, in Equation (14), five multipliers and five adders are required to complete a 
transformation. Comparatively, six multipliers are reduced in all. The multipliers and adders 
required to complete the coordinate transformation are in Table 1. 
Table 1. The number of multiplier and adder comparison with Equation (1), (2), (14), and (15) 
Coordinate Traditional 2nd polynomial Optimized 2nd polynomial Reduce 
 Multiplier Adder Multiplier Adder Multiplier Adder 
x 8 5 5 5 
6 0 
y 8 5 5 5 
2.2.2. Optimized Bilinear Interpolation 
Equation (13) appears to require eight multiplications, but careful factorization to exploit 
separability can reduce the multiplication by three [47]. 
Figure 2. The architecture for the bilinear interpolation. (a) Distributions of the four nearest neighbor
pixels and (b) 3D spatial distribution of the bilinear interpolation.
2.2. Optimized Georeferencing Scheme
2.2.1. Optimized Second-Order Polynomial Model
To parallel implement the Equations (1) and (2), the second-order polynomial equations are
optimized to Equations (14) and (15).
x = a0 + a1X+ a2Y+ a3X2 + a4XY+ a5Y2 = a0 + (a1 + a3X)X+ (a2 + a4X+ a5Y)Y
= a0 + p1(X) + p2(X,Y)
(14)
y = b0 + b1X+ b2Y+ b3X2 + b4XY+ b5Y2 = b0 + (b1 + b3X)X+ (b2 + b4X+ b5Y)Y
= b0 + q1(X) + q2(X,Y)
(15)
where, p1(X) = (a1 + a3X)X, p2(X,Y) = (a2 + a4X + a5Y)Y, q1(X) = (b1 + b3X)X,
and q2(X,Y) = (b2 + b4 + b5Y)Y.
In Equation (1), eight multipliers and five adders are required to complete a transformation.
However, in Equation (14), five multipliers and five adders are required to complete a transformation.
Comparatively, six multipliers are reduced in all. The multipliers and adders required to complete
the coordinate transformation are in Table 1.
--;,--(-i, 1_) -....;...---~l (i, j+l) 
I 
u : 
-- -- _':_ -- - ~ - _1::'.1 - -
I (x, y) 
1-u : 
I 
I 
(i+J,j) : (i+l ,j+l) 
Remote Sens. 2019, 11, 124 7 of 28
Table 1. The number of multiplier and adder comparison with Equations (1), (2), (14), and (15).
Coordinate Traditional 2nd Polynomial Optimized 2nd Polynomial Reduce
Multiplier Adder Multiplier Adder Multiplier Adder
x 8 5 5 5
6 0y 8 5 5 5
2.2.2. Optimized Bilinear Interpolation
Equation (13) appears to require eight multiplications, but careful factorization to exploit
separability can reduce the multiplication by three [47].
I1 = I(i, j) + u(I(i+ 1, j)− I(i, j))
I2 = I(i, j+ 1) + v(I(i+ 1, j+ 1)− I(i, j+ 1))
I(x, y) = I1 − v(I2 − I1)
. (16)
To compute the Equation (16), the gray values of four nearest-neighbors must be read from the memory.
The principle of the mapping storage method is to optimize and balance the line and block data access
rate [7]. In order to improve memory access efficiency, a multi-array memory storage method is
designed in Figure 3. The RS image size is m× n. As such, m× n memory units are designed in double
data rate (DDR). The relationship between the address of DDR and (i, j) coordinates can be described
by Equation (17).
addrij = (i− 1)×m+ (j− 1)× n, (17)
where, i and j are the row and column of the raw image, respectively. addrij is the corresponding
address of memory.
Remote Sens. 2019, 11, x FOR PEER REVIEW 7 of 28 
 
1
2
1 2 1
( , ) ( ( 1, ) ( , ))
( , 1) ( ( 1, 1) ( , 1))
( , ) ( )
I I i j u I i j I i j
I I i j v I i j I i j
I x y I v I I
 = + + −
= + + + + − +
= − −
. (16) 
To compute the Equation (16), the gray values of four nearest-neighbors must be read from the 
memory. The principle of the mapping storage method is o optimize and balance the line and bl ck 
data access rate [7]. In order t  improve mem ry access efficiency, a mu ti-array memory storage 
method is designed in Figure 3. The RS image size is m n× . As such, m n× memory units are designed 
in double data rate (DDR). The relationship between the address of DDR and ( , )i j coordinates can be 
escrib d by Equation (17). 
where, i and j are the row and column of the raw image, respectively. ijaddr is the corresponding 
address of memory. 
 
Figure 3. Raw image mapping with double data rate (DDR)  
3. FPGA-Based Implementation of the Georeferencing Scheme 
An FPGA hardware architecture is designed for georeferenced of the remotely sensed imagery 
as shown in Figure 4, which contains five modules: (I) GCP data module; (II) input image module 
(IIM); (III) coefficient calculation module (CCM); (IV) coordinate transformation module (CTM); and 
(V) bilinear interpolation module (BIM). The details of five modules are described as follows. 
 The GCPs data is stored in the RAM of the GCP data module. These parameters are sent to the 
CCM, when the enable signal is being received. 
 The gray values of the input image are saved to the ROM through the IIM, when the enable 
signal is being received. 
 The coefficients of 0a , 1a , 2a , 3a , 4a , 5a , 0b , 1b , 2b , 4b , and 5b are calculated in the CCM when the 
GCP data are arrived. The matrices TA A , T xA L , T yA L , and 1( )TA A − are parallel computed.  
 The coordinate transformation is carried out in the CTM when the coefficients of second-order 
polynomial equation are arrived. The values of i , j , u , 1 u− , v and 1 v− are simultaneously sent 
to the BIM. 
 According to Equation (13) and (16), the interpolation values are calculated. At last, the gray 
values, latitude, and longitude of the georeferenced image are outputted at the same clock 
cycle. 
( 1) ( 1)ijaddr i m j n= − × + − × , (17) 
Figure 3. Raw image mapping with double data rate (DDR)
3. FPGA-Based Implementation of the Georeferencing Scheme
An FPGA hardware architecture is designed for georeferenced of the remotely sensed imagery
as shown in Figure 4, which contains five modules: (I) GCP data module; (II) input image module
(IIM); (III) coefficient calculation module (CCM); (IV) coordinate transformation module (CTM);
and (V) bilinear interpolation module (BIM). The details of five modules are described as follows.
• The GCPs data is stored i f the GCP data module. Th se parameters ar sent to
the CM, when th enable signal is being r ceived.
• The gray values of the input image ar saved to the ROM through the IIM, when the able signal
is being received.
• The coefficients of a0, a1, a2, a3, a4, a5, b0, b1, b2, b4, and b5 are calculated in the CCM when t e GCP
data are arrived. The matrices ATA , ATLx, ATLy, and (ATA)
−1 are parallel computed.
Remote Sens. 2019, 11, 124 8 of 28
• The coordinate transformation is carried out in the CTM when the coefficients of second-order
polynomial equation are arrived. The values of i, j, u, 1− u, v and 1− v are simultaneously sent
to the BIM.
• According to Equations (13) and (16), the interpolation values are calculated. At last, the gray
values, latitude, and longitude of the georeferenced image are outputted at the same clock cycle.Remote Sens. 2019, 11, x FOR PEER REVIEW 8 of 28 
 
 
Figure 4. A field programmable gate array (FPGA) architecture for the georeferenced remote sensing 
images with second-order polynomial equation and bilinear interpolation scheme. RAM: random access 
memory; clk: clock; rst: reset; en: enable. 
3.1. FPGA-Based Solution of the Second-Order Polynomial Equation 
To implement the second-order polynomial equation based on an FPGA, the processing of 
solving the coefficient is decomposed into three modules (as shown in Figure 5): (I) Form the matrices
A , xL , and yL ; (II) Calculate TA A , 1( )TA A − , T xA L , and T yA L , and (III) perform 1( ) ( )T T xA A A L− and 
1( ) ( )T T yA A A L− . 
 
Figure 5. The proposed parallel implementation of the solving coefficient. 
3.1.1. Calculation of Matrix TA A  
According to Equation (9) and the principle of matrix multiplication, the coefficients of 1 ja ( j = 1, 
2, …, 6) are calculated, as shown in Equation (18) and (19). 
Figure 4. A field programmable gate array (FPGA) architecture for the georeferenced remote sensing
images with second-order polynomial equation and bilinear interpolation scheme. RAM: random access
memory; clk: clock; rst: reset; en: enable.
3.1. FPGA-Based Solution of the Second-Order Polynomial Equation
To implement the second-order polynomial equation based on an FPGA, the processing of solving
the coefficient is decomposed into three modules (as shown in Figure 5): (I) Form the matrices
A , Lx, and Ly; (II) Calculate ATA , (ATA)
−1, ATLx, and ATLy, and (III) perform (ATA)
−1
(ATLx)
and (ATA)−1(ATLy).
Remote Sens. 2019, 11, x FOR PEER REVIEW 8 of 28 
 
 
Figure 4. A field programmable gate rray (FPGA) architec ure for the georef r nced remote sensing 
images with second-order polyn mial equation and bilinear interpolation scheme. RAM: random ac es  
memory; clk: cloc ; rst: reset; en: nable. 
3.1. FPGA-Based Solution of the Second-Order Polynomial Equation 
To implement the second-order polynomial equation based on an FPGA, the processing of 
solving the coefficient is decomposed into three modules (as shown in Figure 5): (I) Form the matrices
A , xL , and yL ; (II) Calculate TA A , 1( )TA A − , T xA L , and T yA L , and (III) perform 1( ) ( )T T xA A A L− and 
1( ) ( )T T yA A A L− . 
 
Figure 5. The proposed parallel implementation of the solving coefficient. 
3.1.1. Calculation of Matrix TA A  
According to Equation (9) and the principle of matrix multiplication, the coefficients of 1 ja ( j = 1, 
2, …, 6) are calculated, as shown in Equation (18) and (19). 
Figure 5. The proposed parallel implementation of the solving coefficient.
3.1.1. Calculation of Matrix ATA
According t Equation (9) and the principle of matrix multiplication, the coefficients of a1j
(j = 1, 2, . . . , 6) are calculated, as shown in Equations (18) and (19).
X1, ··· ,X10 X1, ··· ,X10 A T6x10 ,A\Ox6 
!a = (A' Ar1(A1 Lx) I 
a0 ,al'···a5 
- -
r 
~ GCPdata Y1,- ·· ,Y10 Yi ··· Yio Lxrnxl module RAM 
X1,·· ·,X10 .._ Register X1,·· · , X10 
-
A\,10 ,Aw,, 
-: b=(ATAr\ATLyJ I 
b0 ,bl' ·· ·b, _ 
~ 1/t, · ·, 1/tu _ .l/t, .. ,.I/tu Ly10,1 
-
Coefficient calculation module 
~ 
j (XurYur ) (Xu1 ,_1/111) (x,, ,y,, ) (x1,, y,, ) 
~ 
i (X,,,, Y,,, ) (X,,1 , Y,,1 ) (X,,, Y,, ) (X11, Y11 ) Input image module u 
,-# X p = Xmm + X cso(X p -1) (row,column,gray) lt Y,- Y,, , Ycs, (y,-1) ~ 
Coordmate transformat10n module 
I 
IT( x,y ) = I(1, j)(l - u)(l - v) + I(z + 1, j)u(l -v) Gray(x,y) -
/(i,j) + I (z,J + 1)(1-u)v+ I(t + 1, J + l)uv Longitude -Latitude Bilinear interpolation module 
"-------------------------
1 X1 Y1 X/ X1Y1 Y? ,l X2 Y2 X/ X2Y2 Yl 
~-------------------------------1 
:1 I 
a11 a12 au a1, a1, a10 
[u,, u,,] '[L" ] ' 
U,2 h 1 L21. a=( A' Ar'(A' Lx) 
Uo,U1,·· · Ua 
-4!1 (A'Ar' 
parallel block Lu decompositi on 
A matrix .410, 6 
y,, ",Yrn 
++!I A'Lx1 A'Lx2 A1 Lt1 A'Lx4 A7 Lx, I: 
lxmatrix lxi,H A"1Lxmatrix "A 7 LX,: r~ bo,b1, ··bs 
~==================: ~===============~1 b=(A' Ar'(A'LyJ , 
ly = y1 y2 y1 y~ y, y(; y1 y~ y9 y,o +.iii-I A1 l,y1 A1 l,y2 Ai l,yi Arl.y~ Ari.Yi A1 /,y6 
ly matrix ly11),1 1 :I ATf.y matrix 
(I) (II) (III) 
Remote Sens. 2019, 11, 124 9 of 28
ATA =

A11 A21 A31 A41 A51 A61 A71 A81 A91 A101
A12 A22 A32 A42 A52 A62 A72 A82 A92 A102
A13 A23 A33 A43 A53 A63 A73 A83 A93 A103
A14 A24 A34 A44 A54 A64 A74 A84 A94 A104
A15 A25 A35 A45 A55 A65 A75 A85 A95 A105
A16 A26 A36 A46 A56 A66 A76 A86 A96 A106
×

A11 A12 A13 A14 A15 A16
A21 A22 A23 A24 A25 A26
A31 A32 A33 A34 A35 A36
A41 A42 A43 A44 A45 A46
A51 A52 A53 A54 A55 A56
A61 A62 A63 A64 A65 A66
A71 A72 A73 A74 A75 A76
A81 A82 A83 A84 A85 A86
A91 A92 A93 A94 A95 A96
A101 A102 A103 A104 A105 A106

, (18)

A11
A21
A31
A41
A51
A61
A71
A81
A91
A101

T
×

A11 A12 A13 A14 A15 A16
A21 A22 A23 A24 A25 A26
A31 A32 A33 A34 A35 A36
A41 A42 A43 A44 A45 A46
A51 A52 A53 A54 A55 A56
A61 A62 A63 A64 A65 A66
A71 A72 A73 A74 A75 A76
A81 A82 A83 A84 A85 A86
A91 A92 A93 A94 A95 A96
A101 A102 A103 A104 A105 A106

=

a11
a12
a13
a14
a15
a16

T
. (19)
Generally, ten multipliers and nine adders are needed to compute a11. Therefore, it takes
about 324 additions and 360 multiplications to calculate all coefficients. However, the resources
are limited on an FPGA chip. Therefore, a combination of serial and parallel strategy is adopted [48,49].
To improve the processing speed, multiplier-adder (MD) modules are used. At the same way, the other
coefficients of a1j, a2j, a3j, a4j, a5j, and a6j (j = 1, 2, 3, . . . , 6) are parallel computed at the same clock
cycle. Figure 6 shows the architecture of ATA based on an FPGA.
Remote Sens. 2019, 11, x FOR PEER REVIEW 9 of 28 
 
11 21 31 41 51 61 71 81 91 101
12 22 32 42 52 62 72 82 92 102
13 23 33 43 53 63 73 83 93 103
14 24 34 44 54 64 74 84 94 104
15 25 35 45 55 65 75 85 95 105
16 26 36 46 56 66 76 86 96 106
T
A A A A A A A A A A
A A A A A A A A A A
A A A A A A A A A A
A A
A A A A A
A A A A A A A A A A
A A A A A A A A A A

= 
11 12 13 14 15 16
21 22 23 24 25 26
31 32 33 34 35 36
41 42 43 44 45 46
51 52 53 54 55 56
61 62 63 64 65 66
71 72 73 74 75 76
81 82 83 84 85 86
91 92 93 94 95 96
101 102 103 104 105 106
A A A A A A
A A A A A A
A A A A A A
A A A A A A
A A A A A A
A A A A A A
A A A A A A
A A A A A A
A A A A A A
A A A A A A


×

             
, 
(18)
11 11 12 13 14 15 16
21 21 22 23 24 25 26
31 31 32 33 34 35 36
41 41 42 43 44 45 46
51 51 52 53 54 55 56
61 61 62 63 64 65 66
71 71 72 73 74 76
81 81 82 83
91
101
T
A A A A A A A
A A A A A A A
A A A A A A A
A A A A A A A
A A A A A A
A A A A A
A A A A A
A A A A
A
A
         ×        
11
12
13
14
15
84 85 86 16
91 92 93 94 95 96
101 102 103 104 105 106
T
a
a
a
a
a
A A A a
A A A A A A
A A A A A A
                 =                   
. 
(19)
Generally, ten multipliers and nine adders are needed to compute 11a . Therefore, it takes about 
324 additions and 360 multiplications to calculate all coefficients. However, the resources are limited 
on an FPGA chip. Therefore, a combination of serial and parallel strategy is adopted [48,49]. To 
improve the processing speed, multiplier-adder (MD) modules are used. At the same way, the other 
coefficients of 1 ja , 2 ja , 3 ja , 4 ja , 5 ja , and 6 ja ( j = 1, 2, 3, …, 6) are parallel computed at the same clock 
cycle. Figure 6 shows the architecture of ATA based on an FPGA. 
 
Figure 6. The architectures of TA A . 
3.1.2. LU Decomposition of the Matrix TA A  
The coefficient accuracy of the second-order polynomial equation is crucial to the entire system. 
This paper proposes a parallel structure that uses the floating-point block LU decomposition to solve 
the inverse of TA A . The matrix A can be represented as four matrices 11A , 12A , 21A , and 22A [48– 51], 
as shown in Equation (20). 
Figure 6. The architectures of ATA .
3.1.2. LU Decomposition of the Matrix ATA
The coefficient accuracy of the second-order polynomial equation is crucial to the entire system.
This paper proposes a parallel structure that uses the floating-point block LU decomposition to solve
the inverse of ATA . The matrix A can be represented as four matrices A11, A12, A21, and A22 [48–51],
as shown in Equation (20).
ATA =

a11 a12 a13 | a14 a15 a16
a21 a22 a23 | a24 a25 a26
a31 a32 a33 | a34 a35 a36
− − − − − − −
a41 a42 a43 | a44 a45 a46
a51 a52 a53 | a54 a55 a56
a61 a62 a63 | a64 a65 a66

=
[
A11 A12
A21 A22
]
=
[
L11
L21 L22
]
×
[
U11 U12
U22
]
, (20)
a; ,........,. /---..l---~ 
b; 
AT(l, :) ~ a, i 
A(:, 7) 
A T (2_, :) ~ a2i 
A(:, J) 
A T(S,:) 
A(:, j) 
A T(6,:) 
A(:,j) 
~ as j 
~ a6i 
j = 1, 2, · · · , 6 
Remote Sens. 2019, 11, 124 10 of 28
where A11, A12, A21, A22, L21 and U12 are 3× 3 matrices; L11 and L22 are 3× 3 lower triangular matrices,
U11 and U22 are 3× 3 upper triangular matrices. From Equation (20), some additional equations can
be achieved:
A11 = L11U11, (21)
A12 = L11U12, (22)
A21 = L21U11, (23)
A22 = L21U12 + L22U22. (24)
The steps of LU decomposition are as follows.
Step 1: A11 is performed by block LU decomposition; L11 and U11 are obtained.
Step 2: From Equation (22), U12 can be solved by Equation (25):
U12 = L−111 A12. (25)
Step 3: From Equation (23), L21 can be calculated by the product of A21 and (U11)
−1 (see Equation (26)):
L21 = A21U−111 . (26)
Step 4: A22− A21U12 matrix is performed by LU decomposition; L22 andU22 matrices are obtained.
1. FPGA-Based Parallel Computation for the LU Decomposition of Matrix A11
To implement the LU decomposition of matrix A11 on the FPGA, Equation (21) can be modified
as Equation (27). The steps of the LU decomposition of matrix A11 are as follows.
A11 =
 a11 a12 a13a21 a22 a23
a31 a32 a33
 =
 1l21 1
l31 l32 1
×
 u11 u12 u13u22 u23
u33
 = L11U11, (27)
Step 1. u11 = a11, u12 = a12, u13 = a13, l11 = l22 = l33 = 1, l21 = a21/u11 = a21/a11,
l31 = a31/a11.
Step 2. u22 = a22 − l21u12, u23 = a23 − l21u13.
Step 3. l32 = (a32 − l31u12)/u22.
Step 4. u33 = a33 − l31u13 − l32u23 = a33 − (l31u13 + l32u23).
According to Equation (27), a parallel computation architecture for matrices L11 and U11 is
presented in Figure 7. Five multipliers, three divisors, one adder, and four subtractors are used.
Additionally, some control signals are used to ensure that the results are outputted at the same
clock cycle.
Remote Sens. 2019, 11, 124 11 of 28Remote Sens. 2019, 11, x FOR PEER REVIEW 11 of 28 
 
 
Figure 7. Parallel computation block lower-upper (LU) decomposition of 11A . 
2. FPGA-Based Parallel Computation for Matrices 111( )L − and 111( )U −   
It is well known that the inverse of a lower triangular matrix is another lower triangular matrix, 
and the inverse of an upper triangular matrix is also another upper triangular matrix. The inversion 
of matrices 11L and 11U can be rewritten as Equation (28) and (29), respectively.  
21 21
31 32 31 32
1 1 1
1 1 1
1 1 1
IL L
IL IL L L
          
× =               
, (28) 
where 11 21IL L= − , 32 32IL L= − , and 31 32 21 31IL IL L L= − − . 
11 12 13 11 12 13
22 23 22 23
33 33
1
1
1
IU IU IU U U U
IU IU U U
IU U
          
× =               
, (29) 
where 11 111 /IU U= , 22 221 /IU U= , 33 331 /IU U= , 12 11 12 22 12 11 22( ) / /IU IU U U U U U= − = − ,
23 22 23 33 23 22 33( ) / /IU IU U U U U U= − = − , and 13 11 13 12 23 33( ) /IU IU U IU U U= − + .  
The parallel computation structures for calculating 111( )L − and 111( )U − are shown in Figure 8. Six 
multipliers, four divisors, one subtractor (“—” in the circle), and three reciprocals (“/” in the circle) 
are used. The reverse operation (“-“in the circle) for floating-point is simply to reverse the symbol bit 
and require very few resources. Additionally, some control signals are used to ensure that the results 
are outputted at the same clock cycle. 
 
 
 
(a) 
 
(b) 
Figure 7. Parallel computation block lower-upper (LU) decomposition of A11.
2. FPGA-Based Parallel Computation for Matrices (L11)
−1 and (U11)−1
It is well known that the inverse of a lower triangular matrix is another lower triangular matrix,
and the inverse of an upper triangular matrix is also another upper triangular matrix. The inversion of
matrice L11 and U11 can be rewritten as Equations (28) nd (29), respec ively. 1IL21 1
IL31 IL32 1
×
 1L21 1
L31 L32 1
 =
 1 1
1
, (28)
where IL11 = −L21, IL32 = −L32, and IL31 = −IL32L21 − L31. IU11 IU12 IU13IU22 IU23
IU33
×
 U11 U12 U13U22 U23
U33
 =
 1 1
1
, (29)
where IU11 = 1/U11, IU22 = 1/U22, IU33 = 1/U33, IU12 = −(IU11U12)/U22 = −U12/U11U22,
IU23 = −(IU22U23)/U33 = −U23/U22U33, and IU13 = −(IU11U13 + IU12U23)/U33.
The parallel computation structures for calculating (L11)
−1 and (U11)−1 are shown in Figure 8.
Six multipliers, four divisors, one subtractor (“—” in the circle), and three reciprocals (“/” in the circle)
are used. The r verse operation (“-” in the ci cle) for floating-poi t is simply to reverse the symbol bit
and require very few resources. Additionally, some control signals are used to ensure that the results
are outputted at the same clock cycle.
a11 
a1 2 D-+-+---+-'-lY 
a13 
U 11 1U11 
IU12 
U12 
JU1J 
Un 
U22 JU22 
L21 IL,1 
U23 lU23 JL32 
L32 JL31 U33 1Ui3 
Li, 
Remote Sens. 2019, 11, 124 12 of 28
Remote Sens. 2019, 11, x FOR PEER REVIEW 11 of 28 
 
 
Figure 7. Parallel computation block lower-upper (LU) decomposition of 11A . 
2. FPGA-Based Parallel Computation for Matrices 111( )L − and 111( )U −   
It is well known that the inverse of a lower triangular matrix is another lower triangular matrix, 
and the inverse of an upper triangular matrix is also another upper triangular matrix. The inversion 
of matrices 11L and 11U can be rewritten as Equation (28) and (29), respectively.  
21 21
31 32 31 32
1 1 1
1 1 1
1 1 1
IL L
IL IL L L
          
× =               
, (28) 
where 11 21IL L= − , 32 32IL L= − , and 31 32 21 31IL IL L L= − − . 
11 12 13 11 12 13
22 23 22 23
33 33
1
1
1
IU IU IU U U U
IU IU U U
IU U
          
× =               
, (29) 
where 11 111 /IU U= , 22 221 /IU U= , 33 331 /IU U= , 12 11 12 22 12 11 22( ) / /IU IU U U U U U= − = − ,
23 22 23 33 23 22 33( ) / /IU IU U U U U U= − = − , and 13 11 13 12 23 33( ) /IU IU U IU U U= − + .  
The parallel computation structures for calculating 111( )L − and 111( )U − are shown in Figure 8. Six 
multipliers, four divisors, one subtractor (“—” in the circle), and three reciprocals (“/” in the circle) 
are used. The reverse operation (“-“in the circle) for floating-point is simply to reverse the symbol bit 
and r quire very few resources. Additionally, some control signals are used to ensure that the results 
are outputted at the same clock cycle. 
 
 
 
(a) 
 
(b) 
Figure 8. The architecture FPFA-based for (L11)
−1 and (U11)−1 matrices. (a) The architecture for
solving (L11)
−1. (b) The architecture for solving (U11)−1.
3. FPGA-Based Implementation of Matrices U12 and L21
To implement U12 and L21 based on the FPGA, Equations (25) and (26) can be rewritten into
Equations (30) and (31). In Equation (30), the elements of the matrix contain three formats, i.e., a14,
IL21a14 + a24, and IL31a14 + IL32a24 + a34. In Equation (31), the elements of the matrix contain three
formats, i.e., a41 IU11, a41 IU12 + a42 IU22, and a41 IU13 + a42 IU23 + a43 IU33. The proposed parallel
architecture for calculation of L21 and U12 is depicted in Figure 9. Twelve multipliers, fifteen MD
modules, and three adders are used.
U12 = (L11)−1A12 =
 1IL21 1
IL31 IL32 1
×

a14 a15 a16
a24 a25 a26
a34 a35 a36

=

a14 a15 a16
IL21a14 + a24 IL21a15 + a25 IL21a16 + a26
IL31a14 + IL32a24 + a34 IL31a15 + IL32a25 + a35 IL31a16 + IL32a26 + a36

=
 U12_11 U12_12 U12_13U12_21 U12_22 U12_23
U12_31 U12_32 U12_33

(30)
L21 = A21U11−1 =
 a41 a42 a43a51 a52 a53
a61 a62 a63
×

IU11 IU12 IU13
IU22 IU23
IU33

=

a41 IU11 a41 IU12 + a42 IU22 a41 IU13 + a42 IU23 + a43 IU33
a51 IU11 a51 IU12 + a52 IU22 a51 IU13 + a52 IU23 + a53 IU33
a61 IU11 a61 IU12 + a62 IU22 a61 IU13 + a62 IU23 + a63 IU33

=
 L21_11 L21_12 L21_13L21_21 L21_22 L21_23
L21_31 L21_32 L21_33

(31)
U11 IU11 
JU,, 
U12 
JU" 
Un 
U22 IU22 
L21 JL21 
u23 JU,i JL,2 
L,2 JL,, u,, /U'1 
L,1 
Remote Sens. 2019, 11, 124 13 of 28
Remote Sens. 2019, 11, x FOR PEER REVIEW 12 of 28 
 
Figure 8. The architecture FPFA-based for 111( )L − and 111( )U − matrices. (a) The architecture for solving
1
11( )L − . (b) The architecture for solving 111( )U − . 
3. FPGA-Based Implementation of Matrices 12U and 21L  
To implement 12U and 21L based on the FPGA, Equation (25) and (26) can be rewritten into 
Equation (30) and (31). In Equation (30), the elements of the matrix contain three formats, i.e., 14a , 
21 14 24IL a a+ , and 31 14 32 24 34IL a IL a a+ + . In Equation (31), the elements of the matrix contain three formats, 
i.e., 41 11a IU , 41 12 42 22a IU a IU+ , and 41 13 42 23 43 33a IU a IU a IU+ + . The proposed parallel architecture for 
calculation of 21L and 12U is depicted in Figure 9. Twelve multipliers, fifteen MD modules, and three 
adders are used. 
14 15 16
-
12 11 12 21 24 25 26
31 32 34 35 36
14 15 16
21 14 24 21 15 25 21 16 26
31 14 32 24 34 31 15 32 25 35 31 16 32 26 36
12 _
1
= A 1
1
    
   
a a a
U L IL a a a
IL IL a a a
a a a
IL a a IL a a IL a a
IL a IL a a IL a IL a a IL a IL a a
U
      
= ×         
  
= + + +  + + + + + + 
=
1（ ）
11 12 _ 12 12 _ 13
12 _ 21 12 _ 22 12 _ 23
12 _ 31 12 _ 32 12 _ 33
U U
U U U
U U U
     
. (30)
 
41 42 43 11 12 13
1
21 21 11 51 52 53 22 23
61 62 63 33
41 11 41 12 42 22 41 13 42 23 43 33
51 11 51 12 52 22 51 13 52 23 53 33
61 11 61 12 62
     =
a a a IU IU IU
L A U a a a IU IU
a a a IU
a IU a IU a IU a IU a IU a IU
a IU a IU a IU a IU a IU a IU
a IU a IU a I
−
      
= = ×         
+ + +
+ + +
+ 22 61 13 62 23 63 33
21_ 11 21_ 12 21_ 13
21_ 21 21_ 22 21_ 23
21_ 31 21_ 32 21_ 33
    
U a IU a IU a IU
L L L
L L L
L L L
    + + 
  
=    
, (31)
 
 
(a) 
 
(b) 
Figure 9. Parallel computation method for 21L and 12U . (a) The architecture of 21L . (b) The architecture 
of 12U . 
Figure 9. Par llel computation method for L21 and U12. (a) The architecture of L21. (b) The architecture
of U12.
4. FPGA-Based Implementation of L22 and U22
To implement L22 and U22, Equation (24) can be described by
NewA22 = A22 − L21U12
=
 a44 a45 a46a54 a55 a56
a63 a65 a66
−
 L21_11 L21_12 L21_13L21_21 L21_22 L21_23
L21_31 L21_32 L21_33
×

U12_11 U12_12 U12_13
U12_21 U12_22 U12_23
U12_31 U12_32 U12_33

=
 1L22_21 1
L22_31 L22_32 1
×
 U22_11 U22_12 U22_13U22_22 U22_23
U22_33

(32)
where NewA22 is a 3× 3 matrix, with a similar format of A11. Therefore, the LU decomposition of
NewA22 is not deduced in details. The parallel implementation is shown in Figure 10.
Remote Sens. 2019, 11, x FOR PEER REVIEW 13 of 28 
 
4. FPGA-Based Implementation of 22L and 22U  
To implement 22L and 22U , Equation (24) can be described by 
22 22 21 12
44 45 46 21_ 11 21_ 12 21_ 13 12 _ 11 12 _ 12 12 _ 13
54 55 56 21_ 21_ 2 21_ 23 12 _ 21 12 _ 22 12 _ 23
63 65 66 21_ 31 21_ 32 21_ 33 12 _ 31 12 _ 32 12 _ 33
22 _ 21
22 _ 31
1
1
NewA A L U
a a a L L L U U U
a a a L L L U U U
a a a L L L U U U
L
L
= −
          
= − ×               
=
22 _ 11 22 _ 12 22 _ 13
22 _ 22 22_23
22 _ 32 22 _ 331
U U U
U U
L U
      
×         
, 
(32) 
where 22NewA is a 3 3× matrix, with a similar format of 11A . Therefore, the LU decomposition of 
22NewA is not deduced in details. The parallel implementation is shown in Figure 10. 
 
Figure 10. The architecture of 22L and 22U . 
For the Block LU decomposition, the number of multiplications is about 3 2/ 3 ( % 0.5)n n d n+ − , 
and division is ( 1) / 2d n+ (where d is the dimension of each block, n is the size of matrix). In the 
standard LU decomposition, the number of multiplication operations is about (2 1)( 1) / 6n n n− − n, and 
division is ( 1) / 2n n − [52]. The multipliers and dividers based block LU decomposition are 
approximately reduced 1.02 and 1.25 times than that the traditional LU decomposition. 
3.1.3. FPGA-Based Implantation of the Matrix 1( )TA A − . 
When 11L , 21L , 22L , 11U , 12U , and 22U matrices are solved, the block LU decomposition of matrix 
TA A has been completed,. The processing of inversion matrix TA A is based on the Equation (33). 
1 1 1
11 11 12 11 12 111
21 22 22 22 21 22
-1 -1 -1 -1 -1 -1
11 11 12 21 12 22
-1 -1 -1 -1
22 21 22 22
( )
             =
T L U U U U LA A B
L L U U L L
U L M N M L
U N U L
−
− −
−
        
= = =                 
 +  
, (33)
where 1 1 121 22 21 11N L L L− − −= − and 1 1 112 11 12 22M U U U− − −= − . 
1. FPGA-Based Implementation of 121N− and 112M−  
Figure 11 shows the implementation of 121N− and 112M− , It contains MD_3 module, MD_2 module 
and negative operation.  
Figure 10. The architecture of L22 and U22.
For the Block LU decomposition, the number of multiplications is about n3/3+ (n%d− 0.5)n2,
and division is (d + 1)n/2 (where d is the dimension of each block, n is the size
of matrix). In the standard LU decomposition, the number of multiplication operations is
about n(2n− 1)(n− 1)/6 n, and division is n(n − 1)/2 [52]. The multipliers and dividers based
block LU decomposition are approximately reduced 1.02 and 1.25 times than that the traditional
LU decomposition.
a 41 
a s1 
a 61 
a(4, j) 
a(5, j) 
a(6, j) c::>----------, 
L21_(1,:) 
U12_(:, j) ... 
L21_(1,:) D-----, 
U12_(:, j) 
L21_(1,:) ,}------, 
U12_(:, j) 
L 21 _11 
L ,1_ 21 
L 21 _31 
L ,1_12 
L ,1_22 
L ,1_32 
L 21 _n 
L ,1_ 23 
L 21 _33 
a14 
a2• 
j=l,2,3 
LU matrix 
module 
U 12_ 21 
U 12_31 
U 12_ 22 
U 12 _32 
U 12 23 
U12 _ 33 
U 12 _ 11 
U 12_12 
U 12 _13 
Remote Sens. 2019, 11, 124 14 of 28
3.1.3. FPGA-Based Implantation of the Matrix (ATA)−1
When L11, L21, L22, U11, U12, and U22 matrices are solved, the block LU decomposition of matrix
ATA has been completed,. The processing of inversion matrix ATA is based on the Equation (33).
(ATA)−1 = B =
([
L11
L21 L22
][
U11 U12
U22
])−1
=
[
U11 U12
U22
]−1[
L11
L21 L22
]−1
,
=
[
U−111 L
−1
11 +M
−1
12 N
−1
21 M
−1
12 L
−1
22
U−122 N
−1
21 U
−1
22 L
−1
22
] (33)
where N−121 = −L−122 L21L−111 and M−112 = −U−111 U12U−122 .
1. FPGA-Based Implementation of N−121 and M
−1
12
Figure 11 shows the implementation of N−121 and M
−1
12 , It contains MD_3 module, MD_2 module
and negative operation.Remote Sens. 2019, 11, x FOR PEER REVIEW 14 of 28 
 
 
(a) 
 
 
(b) 
Figure 11. Implementation of 121N− and 112M−  (a) The architecture of 121N− . (b) The architecture of 112M−  
2. FPGA-Based Implementation of 1( )TA A −  
The scheme for 1( )TA A − is shown in Figure 12. Figure 12a is an FPGA implementation of solving 
the 1 1 1 111 11 21 21U L M N− − − −+ matrix. Ten MD_3 modules, three MD_2 modules, two multipliers, and nine 
adders are used to solve the 1 1 1 111 11 21 21U L M N− − − −+ matrix. Figure 12b shows the parallel computation for 
1 1
21 22M L− − matrix, three MD_3 modules, and three MD_2 modules are used. The structure for solving the 
1 1
22 21U N− − matrix is shown in Figure 12c, three MD_3 modules, three MD_2 modules, and three 
multipliers are used. Figure 12d shows the implementation of the 1 122 22U L− − matrix, one MD_3 module, 
two MD_2 modules, and two multipliers are used. The Figure 12a through Figure 12d are parallel 
calculated. Finally, 1( )TA A − matrix is obtained and the elements are outputted at the same clock cycle. 
 
. 121 and M
−1
12 (a) The architecture of N
−1
21 . (b) The architecture of M
−1
12 .
2. FPGA-Based Implementation of (ATA)−1
The scheme for (ATA)−1 is shown in Figure 12. Figure 12a is an FPGA implementation of
Ln (:,l) 
NLi,(3,:) 
Un(:,!) 
U ',, (I, I) 
U',.(2,1) 
U,i (3, 3) ~ 
U,J (3, I) ~ 
u,l(:,3) 
U',,(1,:) 
U'n (2,:) 
NL,,(3,:) 
L,,C,2) 
U,, (:,2) 
u,i (3, 3)~ 
u,!(3,2)~ 
U ,,' (:,2) 
L,.(:,3) 
'IL (I J) Ni,'(1,3) 
·~ 
Un (:,3) 
Mi)(l,1) 
Remote Sens. 2019, 11, 124 15 of 28
solving the U−111 L
−1
11 + M
−1
21 N
−1
21 matrix. Ten MD_3 modules, three MD_2 modules, two multipliers,
and nine adders are used to solve the U−111 L
−1
11 + M
−1
21 N
−1
21 matrix. Figure 12b shows the parallel
computation for M−121 L
−1
22 matrix, three MD_3 modules, and three MD_2 modules are used. The structure
for solving the U−122 N
−1
21 matrix is shown in Figure 12c, three MD_3 modules, three MD_2 modules,
and three multipliers are used. Figure 12d shows the implementation of the U−122 L
−1
22 matrix,
one MD_3 module, two MD_2 modules, and two multipliers are used. The Figure 12a through Figure 12d
are parallel calculated. Finally, (ATA)−1 matrix is obtained and the elements are outputted at the same
clock cycle.
Figure 12. Cont.
L,,(:,l) 
M,d(l,:) MD 3 
M,i(2,:) MD 3 
-
M,-J(3,:) MD 3 
-
N~11 (: , IJ 
U,/ (3,3)o--------t0\ Bii (3, 1) ~ 
N,i (3, 1 )~ 
B,J (1, 1) 
B,i(2,l) 
B,l (3,1) 
L,,(:,2) 
,-M- D- 2---, A,/'(1,2) 
(a) 
(b) 
N,/(:,2) 
MD 
MD_2 1------
A;/ '(2,2) 
B;-J{l,2) 
MD 2 1------<::J 
s,-,1 c2, 2) 
MD 2 1--_ __,,.., 
MD_ 2 .__ _ __,,..., 
MD_2 
R,/(2, 2) 
u,/ (3, 3) ~ 
Ni/(3,2) D-----l6' -
(c) 
L,,(3,3) = 1 
A;/ '(2,3) 
B;/(1,2) 
13,/(3,3) 
L,i(3,3) = 1 
Bi-=/(1,3) 
B,i (2,3) 
B,)(3,3) 
MD 3 
B,/(1,3) 
MD_2 
B2;(2,3) 
U,/(3,3) n-'( 3 3) ~
N,,' (3,3) 
Remote Sens. 2019, 11, 124 16 of 28
Figure 12. The implementation of (ATA)−1. (a) Calculate U−111 L
−1
11 +M
−1
21 N
−1
21 matrix. (b) Solve M
−1
21 L
−1
22
matrix. (c) Calculate U−122 N
−1
21 matrix. (d) Compute M
−1
21 L
−1
22 matrix.
3.1.4. FPGA-Based Implementation of a and b Matrices
From the ten GCPs data, Lx and Ly matrix can be obtained, since a and b matrices have the same
structure, such as ATLx, (ATA)
−1
(ATLx), ATLy, and (ATA)
−1
(ATLy), the implementation of a is
given to illustrate the processes. From Equation (34), six MD_10 modules are adopted. Six MD_6
modules are needed for solving Equation (35). Considering the limited resources of an FPGA, a serial
framework is used to implement a , the strategy is depicted in Figure 13 (such as (ATA)−1(ATLx)).
The details of state transition graph (STG) are as follows:
• S_idle, idle state. When the rst signal is low, the system is in the reset state, all registers
(R0, R1, ..., R5) and other signals are reset.
• S_1 to S_6 are the six different state machines (SMs). Under the different SMs, different row
values of the matrix (ATA)−1 with matrix ATLx are calculated in the MD_6 modules. The result
of MD_6 is serial saved in the registers R5 to R0. When the six SMs are finished, the values of
registers R5 to R0 are parallel outputted to the matrix a .
• S_1, the first SM. When the rst signal goes high, the current state enters the first state when
the enable signal is being received. The values of matrix B−1(1, :) = (ATA)−1(1, :) and ATLx are
put into the MD_6 module for multiplication and addition. The result is saved in the register R5.
• S_2, the second SM. When the S_1 state is complete, the current state enters the second state.
The values of matrix B−1(2, :) = (ATA)−1(2, :) and ATLx begin to calculate multiplication
and addition. Then, the result is saved in the register R4.
Remote Sens. 2019, 11, x FOR PEER REVIEW 16 of 28 
 
illustrate the processes. From Equation (34), six MD_10 modules are adopted. Six MD_6 modules are 
needed for solving Equation (35). Considering the limited resources of an FPGA, a serial framework 
is used to implement a , the strategy is depicted in Figure 13 (such as 1( ) ( )T T xA A A L− ). 
The details of state transition graph (STG) are as follows: 
 S_idle, idle state. When the rst signal is low, the system is in the reset state, all registers (R0, 
R1, ..., R5) and other signals are reset. 
 S_1 to S_6 are the six different state machines (SMs). Under the different SMs, different row 
values of the matrix 1( )TA A − with matrix T xA L are calculated in the MD_6 modules. The result 
of MD_6 is serial saved in the regi ters R5 to R0. When the six SMs are finished, the values 
of registers R5 to R0 are parallel outputted to the matrix a . 
 S_1, the first SM. When the rst signal goes high, the current state enters the first state when 
the enable signal is being received. The values of matrix 1 1(1,:) ( ) (1,:)TB A A− −= and T xA L are put 
into the MD_6 module for multiplication and addition. The result is saved in the register 
R5.  
 S_2, the second SM. When the S_1 state is complete, the current state enters the second state. 
The values of matrix 1 1(2,:) ( ) (2,:)TB A A− −= and T xA L begin to calculate multiplication and 
addition. Then, the result is saved in the register R4.  
In this order, when the S_6 SM (final State) is completed, the matrix a is derived. Under the same 
clock cycle, the values of the registers R5 to R0 ( 0a - 5a ) are parallel outputted. The current state 
returns to the first state. Additionally, each state should contain a certain delay time. The delay time 
should include MD_6 module operation time and serial storage time. 
 
(a)                        (b)                       (c) 
Figure 13. The processing of state transition graph (STG). (a) The state machine. (b) Serial framework. 
(c) Serial input and parallel output. 
11 21 31 41 51 61 71 81 91 101
12 22 32 42 52 62 72 82 92 102
13 23 33 43 53 63 73 83 93 103
14 24 34 44 54 64 74 84 94 104
15 25 35 45 56 65 75 85 95 104
16 26 36 46 56 66 76 86 96 106
T
A A A A A A A A A A
A A A A A A A A A A
A A A A A A A A A A
A Lx
A A A A A A A A A A
A A A A A A A A A A
A A A A A A A A A A

= 

1 1
2 2
3 3
4 4
5
10 6
11 1 21 2 31 3 41 4 51 5 61 6 71 7 81 8 91 9 101 10
12 1 22 2 32 3 42 4 52 5
       
lx Lx
lx Lx
lx Lx
lx Lx
Lx
lx Lx
A lx A lx A lx A lx A lx A lx A lx A lx A lx A lx
A lx A lx A lx A lx A lx
                
=                         
+ + + + + + + + +
+ + + +
=

62 6 72 7 82 8 92 9 102 10
13 1 23 2 33 3 43 4 53 5 63 6 73 7 83 8 93 9 103 10
14 1 24 2 34 3 44 4 54 5 64 6 74 7 84 8 94 9 104 10
15 1 25 2 35 3
A lx A lx A lx A lx A lx
A lx A lx A lx A lx A lx A lx A lx A lx A lx A lx
A lx A lx A lx A lx A lx A lx A lx A lx A lx A lx
A lx A lx A lx
+ + + + +
+ + + + + + + + +
+ + + + + + + + +
+ + + 45 4 55 5 65 6 75 7 85 8 95 9 105 10
16 1 26 2 36 3 46 4 56 5 66 6 76 7 86 8 96 9 106 10
A lx A lx A lx A lx A lx A lx A lx
A lx A lx A lx A lx A lx A lx A lx A lx A lx A lx
       + + + + + + 
+ + + + + + + + +  
, (34)
Figure 13. The proce sing of state transition graph (STG). (a) The state machine. (b) Serial framework.
(c) Serial input and para lel output.
In this order, when the S_6 SM (final State) is completed, the matrix a is derived. Under the same
clock cycle, the values of the registers R5 to R0 (a0–a5) are parallel outputted. The current state returns
MD 3 B2i (l, 1) 
B;-I (2, 1) 
u,I (2, :) MD 2 -
U,i ( 3, 3 b---------to\ B,i ( 3, "2-, 
Lil ( 3, 1) ~
5_3 ~ 
\ 
@ 3 -l 3: 
~I 3-1 (6,:) 
B,i (1, 2) U;-) (1,3) 
MD 2 D 
u ;-1 (2,3) B;-) (2, 2) 
MD 2 D 
u;d (3,3) ~ U,i (3,3) 
r;; (3, 2) D-----'0/ ~ D 
(d) 
5_1 
5_3 
5_4 
5_5 
5_6 
B;-I (1, 3) 
a 
B,i(2,3) 
a 
B,I (3, 3) 
a 
Register 
Remote Sens. 2019, 11, 124 17 of 28
to the first state. Additionally, each state should contain a certain delay time. The delay time should
include MD_6 module operation time and serial storage time.
ATLx =

A11 A21 A31 A41 A51 A61 A71 A81 A91 A101
A12 A22 A32 A42 A52 A62 A72 A82 A92 A102
A13 A23 A33 A43 A53 A63 A73 A83 A93 A103
A14 A24 A34 A44 A54 A64 A74 A84 A94 A104
A15 A25 A35 A45 A56 A65 A75 A85 A95 A104
A16 A26 A36 A46 A56 A66 A76 A86 A96 A106


lx1
lx2
lx3
lx4
...
lx10

=

Lx1
Lx2
Lx3
Lx4
Lx5
Lx6

=

A11lx1 + A21lx2 + A31lx3 + A41lx4 + A51lx5 + A61lx6 + A71lx7 + A81lx8 + A91lx9 + A101lx10
A12lx1 + A22lx2 + A32lx3 + A42lx4 + A52lx5 + A62lx6 + A72lx7 + A82lx8 + A92lx9 + A102lx10
A13lx1 + A23lx2 + A33lx3 + A43lx4 + A53lx5 + A63lx6 + A73lx7 + A83lx8 + A93lx9 + A103lx10
A14lx1 + A24lx2 + A34lx3 + A44lx4 + A54lx5 + A64lx6 + A74lx7 + A84lx8 + A94lx9 + A104lx10
A15lx1 + A25lx2 + A35lx3 + A45lx4 + A55lx5 + A65lx6 + A75lx7 + A85lx8 + A95lx9 + A105lx10
A16lx1 + A26lx2 + A36lx3 + A46lx4 + A56lx5 + A66lx6 + A76lx7 + A86lx8 + A96lx9 + A106lx10

,
(34)
a = (ATA)
−1
(ATLx) =
[
B−111 B
−1
12
B−121 B
−1
22
]

Lx1
Lx2
Lx3
Lx4
Lx5
Lx6

=

a0
a1
a2
a3
a4
a5

. (35)
3.2. FPGA-Based Implementation of Coordinate Transformation and Bilinear Interpolation Algorithm
Figure 14 shows the flowchart for the coordinate transformation and bilinear interpolation method,
which consists of the transformation from the output image coordinates to the ground coordinates,
conversion the ground coordinate to the raw image coordinates, and bilinear interpolation.
Remote Sens. 2019, 11, x FOR PEER REVIEW 17 of 28 
 
3.2. FPGA-Based Implementation of Coordinate Transformation and Bilinear Interpolation Algorithm  
Figure 14 shows the flowchart for the coordinate transformation and bilinear interpolation 
method, which consists of the transformation from the output image coordinates to the ground 
coordinates, conversion the ground coordinate to the raw image coordinates, and bilinear 
interpolation.  
 
Figure 14. The flowchart of the coordinate transformation and bilinear interpolation. 
1. FPGA-Based Implementation of gX and gY  
The purpose of this step is to obtain gX and gY in Equation (12) of Section 2.1.2, as shown in Figure 
15. In this part, 32-bit integer and shift register are used to reduce the resource utilization of an FPGA. 
To implement the shift operation, GSDX and GSDY are approximated as 2m or 2 2m n− ( m and n are integers). 
For instance, 30GSD GSDX Y= = , which can be expressed as the form of 52 2− . In Figure 15, the symbol "<<" 
in the circle represents the left shift operation. 
 
Figure 15. Parallel computation method for gX and gY . 
2. FPGA-Based Implementation of Bilinear Interpolation  
When gX and gY are calculated, the coordinates of _x row and _y column are obtained from Equation 
(14) and (15). The bilinear interpolation algorithm is performed based on the _x row and _y column as 
shown in Figure 16. As observed from Figure 16, the processing of bilinear interpolation is divided 
into three steps. 
1 0
2 1
1 1
3 211 121
1 1
4 321 22
5 4
6 5
( ) ( )T T x
Lx a
Lx a
Lx aB B
a A A A L
Lx aB B
Lx a
Lx a
− −
−
− −
             
= = =                      
. (35)
Figure 14. The flowchart of the coordinate transformation and bilinear interpolation.
1. FPGA-Based Implementation of Xg and Yg
The purpose of this step is to btain Xg and Yg in Equation (12) of Section 2.1.2, as shown in
Figure 15. In this part, 32-bit integer and shift register are used to reduce the resource utilization
of an FPGA. To implement the shift operation, XGSD and YGSD are approximated as 2m or 2m − 2n
(m and n are integers). For instance, XGSD = YGSD = 30, which can be expressed as the form
of 25 − 2. In Figure 15, the symbol “<<” in the circle represents the left shift operation.
The ground coordinates i.-- --, 
(Xg,Yg) 
The coordinates of output 
image (xr,YP) 
The raw coordinats (x,y) 
No 
1' bl 
X 
l'bl 
y 
Whether in the 
original image 
? 
No 
Bilinear 
interpolation Save data 
Xs 
Ys 
Remote Sens. 2019, 11, 124 18 of 28
Remote Sens. 2019, 11, x FOR PEER REVIEW 17 of 28 
 
3.2. FPGA-Based Implementation of Coordinate Transformation and Bilinear Interpolation Algorithm  
Figure 14 shows the flowchart for the coordinate transformation and bilinear interpolation 
method, which consists of the transformation from the output image coordinates to the ground 
coordinates, conversion the ground coordinate to the raw image coordinates, and bilinear 
interpolation.  
 
Figure 14. The flowchart of the coordinate transformation and bilinear interpolation. 
1. FPGA-Based Implementation of gX and gY  
The purpose of this step is to obtain gX and gY in Equation (12) of Section 2.1.2, as shown in Figure 
15. In this part, 32-bit integer and shift register are used to reduce the resource utilization of an FPGA. 
To implement the shift operation, GSDX and GSDY are approximated as 2m or 2 2m n− ( m and n are integers). 
For instance, 30GSD GSDX Y= = , which can be expressed as the form of 52 2− . In Figure 15, the symbol "<<" 
in the circle represents the left shift operation. 
 
Figure 15. Parallel computation method for gX and gY . 
2. FPGA-Based Implementation of Bilinear Interpolation  
When gX and gY are calculated, the coordinates of _x row and _y column are obtained from Equation 
(14) and (15). The bilinear interpolation algorithm is performed based on the _x row and _y column as 
shown in Figure 16. As observed from Figure 16, the processing of bilinear interpolation is divided 
into three steps. 
1 0
2 1
1 1
3 211 121
1 1
4 321 22
5 4
6 5
( ) ( )T T x
Lx a
Lx a
Lx aB B
a A A A L
Lx aB B
Lx a
Lx a
− −
−
− −
             
= = =                      
. (35)
Figure 15. Parallel computation method for Xg and Yg.
2. FPGA-Based Implementation of Bilinear Interpolation
When Xg and Yg are calcul ted, the coordinates f x_row and y_column are obtained from
Equations (14) and (15). The bilinear interpolati n algorithm is performed bas on the x_r w
and y_column as shown in Figure 16. As observed from Figure 16, the processing of bilinear
interpolation is divided into three steps.Remote Sens. 2019, 11, x FOR PEER REVIEW 18 of 28 
 
 
Figure 16. The diagram of bilinear interpolation. 
In step (I), the integer part i , j , the fractional part u , v and the weight part 1 u− and 1 v− of the 
floating-point ( _ , _ )x row y column coordinates are computed, respectively. Figure 17 shows the parallel 
implementation method for i , j , u , v , 1 u− and 1 v− . Four subtractors, two INT (integer) modules and 
two absolute modules (“||” in the circle) are used. 
 
Figure 17. The parallel implementation method for i , j , u , v , 1 u− and 1 v− . 
In step (II), it is mainly to calculate the address of four nearest neighbor pixels and read the 
corresponding gray value. The address of memory can be calculated from Equation (17) of Section 
2.2.2. The processing of reading gray value is shown in Figure 18. To reduce the resources of an FPGA, 
the multiplication is converted into left shift operation. 
 
Figure 18. The architecture for parallel reading gray values 
After the values of i , j , u , v , 1 u− , 1 v− , ( , )I i j , ( 1, )I i j+ , ( , 1)I i j + , and ( 1, 1)I i j+ + are computed, 
the gray value of the georeferenced image can be calculated in step (III). According to Equation (13) 
and (16). The implementation of bilinear interpolation algorithm is shown in Figure 19. Eight 
multipliers and three adders are used to compute the value of ( , _ )I x row y column− . 
i r . i r f ili r i t r l ti .
In step (I), the integer part i, j, the fractional part u, v and the weight part 1 − u and 1 − v
of the floating-point (x_row, y_column) coordinates are computed, respectively. Figure 17 shows
the parallel implementation method for i, j, u, v, 1− u and 1− v. Four subtractors, two INT (integer)
modules and two absolute modules (“||” in the circle) are used.
Remote Sens. 2019, 11, x FOR PEER REVIEW 18 of 28 
 
 
Figure 16. The diagram of bilinear interpolation. 
In step (I), the integer part i , j , the fractional part u , v and the weight part 1 u− and 1 v− of the 
floating-poi t ( _ , _ )x row y column c ordi ates are comput d, res ectively. Figure 17 shows the parallel 
implementation method f r i , j , u , v , 1 u− and 1 v− . Four subtractors, two INT (integer) modules and 
two absolute modules (“||” in the circle) are used. 
 
Figure 17. The parallel implementation method for i , j , u , v , 1 u− and 1 v− . 
In step (II), it is mainly to calculate the address of four nearest neighbor pixels and read the 
corresponding gray value. The address of memory can be calculated from Equation (17) of Section 
2.2.2. The processing of reading gray value is shown in Figure 18. To reduce the resources of an FPGA, 
the multiplication is converted into left shift operation. 
 
Figure 18. The architecture for parallel reading gray values 
After the values of i , j , u , v , 1 u− , 1 v− , ( , )I i j , ( 1, )I i j+ , ( , 1)I i j + , and ( 1, 1)I i j+ + are computed, 
the gray value of the georeferenced image can be calculated in step (III). According to Equation (13) 
and (16). The implementation of bilinear interpolation algorithm is shown in Figure 19. Eight 
multipliers and three adders are used to compute the value of ( , _ )I x row y column− . 
Figure 17. The parallel implementation method for i, j, u, v, 1− u and 1− v.
In step (II), it is mainly to calculate the address of four nearest neighbor pixels and read
the corresponding gray value. The address of memory can be calculated from Equation (17)
of Section 2.2.2. The processing of reading gray value is shown in Figure 18. To reduce the resources of
an FPGA, the multiplication is converted into left shift operation.
j 
l'bl r)---r----
y J--._--t--.....,_~y Y, 
fu lnt32 to x _row =ao +a1Xg + a2Y'? + a3Xf + a4XgYg + asY'? 
r--"+i floating y _column =bi, +b,Xs +b,Y, +b3X: +b4Xg Y, +b,Y, 
h column x _ row J. 
~ i=INT(x _row) I u=fraction(x _row) Gray_row .,._ 
buffer j = INT(y column) 
=x _row- INT(x _row) 
(II) delta_u = l-u I v=fraction(y _ column)= 
-irn delta _ v =l- v y _column-INT(y _column) (I) 
... i ... 
I 
I(x _ row,y _ calm) = I(i, j)( l - u)(l - v) + I(i + 1, j)u(l - v) 
+l(i, j + 1)(1-u)v+ 111;/, j + l)uv 
I 
x _row ~ 
y _colum~ INT 
l-u 
u t~~
t~~l," 
!(i+l,j + I) 
1 'di c:>---+--+---_..,_..!/ 
Gray_row 
buffer 
l(i,j+l) 
!(i+l ,J) 
Remote Sens. 2019, 11, 124 19 of 28
Remote Sens. 2019, 11, x FOR PEER REVIEW 18 of 28 
 
 
Figure 16. The diagram of bilinear interpolation. 
In step (I), the integer part i , j , the fractional part u , v and the weight part 1 u− and 1 v− of the 
floating-point ( _ , _ )x row y column coordinates are computed, respectively. Figure 17 shows the parallel 
implementation method for i , j , u , v , 1 u− and 1 v− . Four subtractors, two INT (integer) modules and 
two absolute modules (“||” in the circle) are used. 
 
Figure 17. The parallel implementation method for i , j , u , v , 1 u− and 1 v− . 
In step (II), it is mainly to calculate the address of four nearest neighbor pixels and read the 
corresponding gray value. The address of memory can be calculated from Equation (17) of Section 
2.2.2. The processing of reading gray value is shown in Figure 18. To reduce the resources of an FPGA, 
the multiplication is converted into left shift operation. 
 
Figure 18. The architecture for parallel reading gray values 
After the values of i , j , u , v , 1 u− , 1 v− , ( , )I i j , ( 1, )I i j+ , ( , 1)I i j + , and ( 1, 1)I i j+ + are computed, 
the gray value of the georeferenced image can be calculated in step (III). According to Equation (13) 
and (16). The implementation of bilinear interpolation algorithm is shown in Figure 19. Eight 
multipliers and three adders are used to compute the value of ( , _ )I x row y column− . 
Figure 18. The architecture for parallel reading gray values.
After the values of i, j, u, v, 1 − u, 1 − v, I(i, j), I(i + 1, j), I(i, j + 1), and I(i + 1, j + 1)
are computed, the gray value of the georeferenced image can be calculated in step (III). According to
Equations (13) and (16). The implementation of bilinear interpolation algorithm is shown in Figure 19.
Eight multipliers and three adders are used to compute the value of I(x− row, y_column).Remote Sens. 2019, 11, x FOR PEER REVIEW 19 of 28 
 
 
Figure 19. The parallel computation for bilinear interpolation 
4. Experiment and Performance Analysis 
4.1. The Software and Hardware Environment 
The proposed scheme is implemented on a custom-designed board which contains a Xilinx 
Virtex-7-XC7VX980t-ffg1930-1 FPGA that has 612,000 logic cells, 1,500kB Block RAM, 1,224,000 Flip-
Flops, and 3,600 DSP slices. Additionally, the design tool is Vivado 2014.2, the simulation tool is 
ModelSim SE-64 10.4, and the hardware design language is Verilog HDL. To validate the proposed 
method, the georeferenced algorithm is also implemented by MATLAB R2014a, Visual Studio 2015 
(C++) and ENVI 5.3 on a PC equipped with an Intel (R) Core i7-4790 CPU @ 3.6GHz and 8GB RAM, 
running Windows 7 (64 bit). 
4.2. Data  
To validate the proposed FPGA-based algorithm, two data sets are used to perform the 
georeferencing. The first data set is acquired from the ENVI example dataset, i.e., bldt_tm.img and 
bldt_tm.pts. The second data set is obtained from the ERDAS example dataset, i.e., tmAtlanta.img 
and panAtanta.img. Figure 20(a) and (b) display the raw data sets of bldt_tm.img and tmAtlanta.img, 
respectively. The image size is 256×256 pixels2. The information of two datasets is shown in Table 2. 
 
(a) 
 
(b) 
Figure 20. The raw image. (a) The first raw image. (b) The second raw image. 
Table 2. The information of two data sets 
Image Projection Zone Resolution (m) Band Wave length (μm) 
bldt_tm.img UTM 13 30×30 3 0.63~0.69 
tmAtlanta.img State plan (NAD 27) 3676 30×30 4 0.76~0.90 
4.3 Processing Performance  
4.3.1. Error Analysis 
Figure 19. The parallel computation for bilinear interpolation.
4. Experiment and Performance Analysis
4.1. The Software and Hardware Environment
The proposed scheme is implemented on a custom-designed board which contains
a Xilinx Virtex-7-XC7VX980t-ffg1930-1 FPGA that has 612,000 logic cells, 1,500 kB Block RAM,
1,224,000 Flip-Flops, and 3,6 0 DSP slices. Additionally, the design tool is Vivado 2014.2, the simulation
to l is ModelSim E-64 10.4, and the hardware design langu ge is Verilog HDL. To validate
the proposed method, the georeferenced algorithm is also implemented by MATLAB R2014a,
Visual Studio 2015 (C++) and ENVI 5.3 on a PC equipp d with an Intel (R) Core i7-4790 CPU @ 3.6 GHz
and 8 GB RAM, runni g Windows 7 (64 bit).
4.2. Data
To validate the proposed FPGA-based algorithm, two data sets are used to perform
the georeferencing. The first data set is acquired from the ENVI example dataset, i.e., bldt_t .img
and bldt_tm.pts. The second data set is obtained from the ERDAS example dataset, i.e., tmAtlanta.img
and panAtanta.img. Figure 20a,b display the raw data sets of bldt_tm.img and t Atlanta.i g,
respectively. The image size is 256 × 256 pixels2. The information of two datasets is shown in Table 2.
l-v 
l-u 
I(i+ l, j) c:>---+-+-------t· 
u 
V 
I(i+ l, j+ 1) 
Gray_row 
buffer 
J(i-1,j+l) 
!(i,j+I) 
!(i+l,j) 
f(i,j) 
J(x_row,y _column) 
Remote Sens. 2019, 11, 124 20 of 28
Remote Sens. 2019, 11, x FOR PEER REVIEW 19 of 28 
 
 
Figure 19. The parallel computation for bilinear interpolation 
4. Experiment and Performance Analysis 
4.1. The Software and Hardware Environment 
The proposed scheme is implemented on a custom-designed board which contains a Xilinx 
Virtex-7-XC7VX980t-ffg1930-1 FPGA that has 612,000 logic cells, 1,500kB Block RAM, 1,224,000 Flip-
Flops, and 3,600 DSP slices. Additionally, the design tool is Vivado 2014.2, the simulation tool is 
ModelSim SE-64 10.4, and the hardware design language is Verilog HDL. To validate the proposed 
method, the georeferenced algorithm is also implemented by MATLAB R2014a, Visual Studio 2015 
(C++) and ENVI 5.3 on a PC equipped with an Intel (R) Core i7-4790 CPU @ 3.6GHz and 8GB RAM, 
running Windows 7 (64 bit). 
4.2. Data  
To validate the proposed FPGA-based algorithm, two data sets are used to perform the 
georeferencing. The first data set is acquired from the ENVI example dataset, i.e., bldt_tm.img and 
bldt_tm.pts. The second data set is obtained from the ERDAS example dataset, i.e., tmAtlanta.img 
and panAtanta.img. Figure 20(a) and (b) display the raw data sets of bldt_tm.img and tmAtlanta.img, 
respectively. The image size is 256×256 pixels2. The information of two datasets is shown in Table 2. 
 
(a) 
 
(b) 
Figure 20. The raw image. (a) The first raw image. (b) The second raw image. 
Table 2. The information of two data sets 
Image Projection Zone Resolution (m) Band Wave length (μm) 
bldt_tm.img UTM 13 30×30 3 0.63~0.69 
tmAtlanta.img State plan (NAD 27) 3676 30×30 4 0.76~0.90 
4.3 Processing Performance  
4.3.1. Error Analysis 
Figure 20. The raw image. (a) The first raw image. (b) The second raw image.
Table 2. The information of two data sets.
Image Projection Zone Resolution (m) Band Wave Length (µm)
bldt_tm.img UTM 13 30 × 30 3 0.63~0.69
tmAtlanta.img State plan (NAD 27) 3676 30 × 30 4 0.76~0.90
4.3. Processing Performance
4.3.1. Error Analysis
To quantitatively evaluate the accuracy of the georeferencing implemented using FPGA, the root
mean squared error (RMSE) is used in Equation (36) through Equation (38) [45].
RMSEx =
√
∑nk = 1 (x
′
k − xk)2
n
, (36)
RMSEy =
√
∑nk = 1 (y
′
k − yk)2
n
, (37)
RMSE =
√
RMSE2x + RMSE2y, (38)
where x′k and y
′
k are coordinates of the georeferenced image which are computed by the proposed
method; xk and yk are reference geodetic coordinates; and n is the number of check points (CPs).
To compute the root mean squared errors (RMSEs), one hundred CPs are selected as shown
in Figure 21.
With the experimental results, a few conclusions can be drawn as follows.
(1) From Equation (36) through (38), the RMSEs are computed based on FPGA, MATLAB,
Visual Studio (C++), and ENVI, as shown in Table 3. It can be concluded that the RMSEx, RMSEy,
and RMSE of the georeferenced image of the first data set implemented using FPGA are 0.1441,
0.1672, and 0.2207 pixels, respectively. The accuracy of the georeferenced using FPGA has the same
values when implemented by MATLAB and Visual studio (C++), and is close to the values
implemented using ENVI software. Additionally, other statistics, such as maximum, minimum
and mean error are computed and listed in Table 4. It can be found that the proposed FPGA-based
algorithm has a mean error of x and y coordinates at 0.0775 pixels and 0.0945 pixels, respectively;
the minimum and the maximum errors of the x and y coordinates are 0.0008 and 0.0026 pixels,
and 0.5113 and 0.4038 pixels, respectively. The accuracy of the various errors are calculated by
the proposed algorithm has the same values than those based on MATLAB and Visual Studio (C++),
and has a close accuracy than that based on ENVI software.
Remote Sens. 2019, 11, 124 21 of 28
Remote Sens. 2019, 11, x FOR PEER REVIEW 20 of 28 
 
To quantitatively evaluate the accuracy of the georeferencing implemented using FPGA, the root 
mean squared error (RMSE) is used in Equation (36) through Equation (38) [45]. 
, 2
1
( )n kkk
x
x x
RMSE
n
=
−
=
 , (36) 
, 2
1
( )n kkk
y
y y
RMSE
n
=
−
=
 , (37) 
2 2
x yRMSE RMSE RMSE= + , (38) 
where ,kx and ,ky are coordinates of the georeferenced image which are computed by the proposed 
method; kx and ky are reference geodetic coordinates; and n is the number of check points (CPs).  
To compute the root mean squared errors (RMSEs), one hundred CPs are selected as shown in 
Figure 21.  
 
(a) 
 
(b) 
Figure 21. Check points distribution in: (a) the first image and (b) the second image. 
With the experimental results, a few conclusions can be drawn as follows.  
(1) From Equation (36) through (38), the RMSEs are computed based on FPGA, MATLAB, Visual 
Studio (C++), and ENVI, as shown in Table 3. It can be concluded that the RMSEx, RMSEy, and RMSE 
of the georeferenced image of the first data set implemented using FPGA are 0.1441, 0.1672, and 
0.2207 pixels, respectively. The accuracy of the georeferenced using FPGA has the same values when 
implemented by MATLAB and Visual studio (C++), and is close to the values implemented using 
ENVI software. Additionally, other statistics, such as maximum, minimum and mean error are 
computed and listed in Table 4. It can be found that the proposed FPGA-based algorithm has a mean 
error of x and y coordinates at 0.0775 pixels and 0.0945 pixels, respectively; the minimum and the 
maximum errors of the x and y coordinates are 0.0008 and 0.0026 pixels, and 0.5113 and 0.4038 pixels, 
respectively. The accuracy of the various errors are calculated by the proposed algorithm has the 
same values than those based on MATLAB and Visual Studio (C++), and has a close accuracy than 
that based on ENVI software. 
Table 3. The values of root mean square errors (RMSEs) of the georeferenced image of the first image. 
Platform RMSEx RMSEy RMSE 
FPGA 0.1441 0.1672 0.2207 
MATLAB 0.1441 0.1672 0.2207 
Visual Studio 
(C++) 0.1441 0.1672 0.2207 
ENVI 0.1442 0.1699 0.2229 
i . i t i t i ti i : ( ) t fi t i ( ) t i .
Table 3. The values of root mean square errors (RMSEs) of the georeferenced image of the first image.
Platform RMSEx RMSEy RMSE
FPGA 0.1441 0.1672 0.2207
MATLAB 0.1441 0.1672 0.2207
Visual Studio (C++) 0.1441 0.1672 0.2207
ENVI 0.1442 0.1699 0.2229
Table 4. Statistics values of the georeferenced image of the first image.
Platform Coordinate Max Error (pixel) Min Error (pixel) Mean Error (pixel)
FPGA
x 0.5113 0.0008 0.0775
y 0.4038 0.0026 0.0945
MATLAB
x 0.5113 0.0008 0.0775
y 0.4038 0.0026 0.0945
Visual Studio (C++)
x 0.5113 0.0008 0.0775
y 0.4038 0.0026 0.0945
ENVI
x 0.5252 0.0008 0.0829
y 0.5252 0.0008 0.1031
(2) Table 5 lists the RMSEs of the georeferenced image of the second data set. The values of
the RMSEx, RMSEy, and RMSE implemented by FPGA are 0.0965, 0.1268, and 0.1593 pixels, respectively.
TheRMSEs implemented using FPGA have the same accuracy than those based on MATLAB and Visual
Studio (C++), and have a close accuracy implemented using ENVI software However, it can be
considered acceptable for the absolute error is less than one pixel [53]. In addition, maximum,
minimum and mean errors are computed and listed in Table 6. It can be found that the mean errors
implemented using FPGA of the x and y coordinates are 0.0789 and 0.1144 pixels, respectively,
and the minimum and maximum errors for the x and y coordinates are 0.0008 and 0.0046 pixels,
and 0.2613 and 0.2081 pixels, respectively.
Table 5. The values of RMSEs of the georeferenced image of the second image.
Platform RMSEx RMSEy RMSE
FPGA 0.0965 0.1268 0.1593
MATLAB 0.0965 0.1268 0.1593
Visual Studio (C++) 0.0965 0.1268 0.1593
ENVI 0.0897 0.1009 0.1350
✓ 
Remote Sens. 2019, 11, 124 22 of 28
Table 6. Statistics values of the georeferenced image of the second image.
Platform Coordinate Max Error (pixel) Min Error (pixel) Mean Error (pixel)
FPGA
x 0.2613 0.0008 0.0789
y 0.2081 0.0046 0.1144
MATLAB
x 0.2613 0.0008 0.0789
y 0.2081 0.0046 0.1144
Visual Studio (C++) x 0.2613 0.0008 0.0789y 0.2081 0.0046 0.1144
ENVI
x 0.4039 0.0003 0.0588
y 0.4039 0.0003 0.0673
As observed form Table 3 through 6, the accuracy of the georeferenced image when implemented
using FPGA can reach the requirements because its RMSEs are less than one pixel [54].
Figure 22a–d show the georeferenced images of the first data set implemented by FPGA, MATLAB,
Visual Studio (C++) and ENVI, respectively. Figure 23a–d show georeferenced images of the second
image implemented by FPGA, MATLAB, Visual Studio (C++), and ENVI, respectively.
Remote Sens. 2019, 11, x FOR PEER REVIEW 22 of 28 
 
As  form Table 3 throug  6, the accuracy f the g oreferenced image when 
implemented using FPGA can reach the requirements because its RMSEs are less than one pixel [54]. 
Figure  sho  the geor f renced images of the first data s t implemented by FPG , 
MATLAB, V sual Studio (C++) and ENVI, respectively. Figure 23a-23d show georeferenced images 
of the second imag  implemented by FPGA, MATLAB, Visual Stu io (C++), and ENVI, respectively. 
Figure 22. The georeferenced image of the first data set. (a) By FPGA. (b) By MATLAB. (c) By Visual 
Studio (C++). (d) By ENVI 5.3. 
(a) (b) (c) (d) 
Figure 23. The georeferenced image of the second data set. (a) By FPGA. (b) By MATLAB. (c) By 
Visual Studio (C++). (d) By ENVI 5.3. 
4.3.2. Gray Value Comparison 
To verify the accuracy of gray value, the gray levels of georeferenced image are compared to 
those implemented using FPGA, MATLAB, Visual Studio (C++), and ENVI software. The 
georeferenced image implemented using FPGA as a referencing image, called “Ref-Img”. The 
georeferenced image implemented using MATLAB, Visual Studio (C++), and ENVI are called “Img-
MATLAB”, “Img-C++”, and “Img-ENVI”, respectively. The gray differences between the Ref-Img 
and those georeferenced images implemented by MATLAB, Visual Studio (C++), and ENVI are 
obtained and shown in Figure 24 and 25. 
As observed from Figure 24, the gray values of Figure 24a and Figure 24b are 0, which means 
the proposed method implemented using FPGA has the same accuracy with the implemented by 
MATLAB, and Visual Studio (C++) [55,56]. Figure 24c indicates that the difference image between 
Ref-Img and Img-ENVI has many pixels are not same. The mean values of Figure 24a–c are 0, 0, and 
0.713 respectively, which are less than 1pixel. 
As observe from Figure 25, the proposed method has the same accuracy as those implemented 
using MATLAB, and Visual Studio (C++), and has a small portion of the gray values are different 
from the Img-ENVI. The mean values of the Figure 25a–c are 0, 0, and 0.1265, respectively, which are 
all less than 1 pixel. 
 
(a) (b) (c) (d) 
Figure 2. The georeferenced image of the first data set. (a) By FPGA. (b) By MATLAB. (c) By Visual
Studio (C +). (d) By ENVI 5.3.
Remote Sens. 2019, 11, x FOR PEER REVIEW 22 of 28 
 
As observed form Table 3 through 6, the accuracy of the georeferenced image when 
implemented using FPGA can reach the requirements because its RMSEs ar  l ss than one ixel [54]. 
Figure 22 –d show th  g oref r nced images of the first data et impleme ted by FPGA, 
MATLAB, Visual Studio (C++) and ENVI, resp ctively. Figure 23a-23d show geor fer nced images 
of the second image implemented by FPGA, MATLAB, Vis al Studi  (C++), and ENVI, respectively. 
Figure 22. The georeferenced image of the first data set. (a) By FPGA. (b) By MATLAB. (c) By Visual 
Studio (C++). (d) By ENVI 5.3. 
(a) (b) (c) (d) 
Figure 23. The georeferenced image of the second data set. (a) By FPGA. (b) By MATLAB. (c) By 
Vis al Studio (C++). (d) By ENVI 5.3.
4.3.2. Gray Value Comparison 
To verify the accuracy of gray value, the gray levels of georeferenced image are compared to 
those implemented sing FPGA, MATLAB, Visual Studio (C++), and ENVI software. The 
georeferenc d image implemented using FPGA as a referencing image, called “Ref-Img”.  
f  i a e implemented using MATL B, Visual Studio (C++), and ENVI are called “Img-
MATLAB”, “Img-C++”, and “Img-ENVI”, respectively. The gray differences between the Ref-Img 
and those georeferenced images implemented by MATLAB, Visual Studio (C++), and ENVI are 
obtained and shown in Figure 24 and 25. 
As observed from i  , the gray values of Figure 24a and Figure 24b are 0, which means 
the proposed meth d implemented using FPGA has the same ccuracy with the implemented by 
MATLAB, and Visual Studio (C++) [55,56]. Figure 24c indicat s that the difference image between 
Ref-Img and Img-ENVI has many pixels are not same. The mean values of Figure 24a–c are 0, 0, and 
0.713 respectively, whic  re less than 1pix l. 
As obser  from Figu  25, the pro osed method has the same accuracy as those implemented 
using MATLAB, and Vis al Studio (C++), and has a small portion of the gray valu s are different 
from the Img-ENVI. The mean values of the Figure 25a–c are 0, 0, a d 0.1265, respectively, which are 
all less than 1 pixel. 
 
(a) (b) (c) (d) 
Figure 23. The i age of the second at set. (a) By FPGA. (b) By MATL B. (c) By Visual
St dio (C++). (d) By ENVI 5.3.
4.3.2. Gray Value Comparison
To verify the accuracy f gray value, the gray levels of georeferenced image are compared to those
implemented using FPGA, MATLAB, Visual Studio (C++), and ENVI software. The georeferenced
image implemented using FPGA as a referencing image, called “Ref-Img”. The georeferenced image
implemented using MATLAB, Visual Studio (C++), and ENVI are called “Img-MATLAB”, “Img-C++”,
and “Img-ENVI”, respectively. The gray differences between the Ref-Img and those georeferenced
images implemented by MATLAB, Visual Studio (C++), and ENVI are obtained and shown in
Figures 24 and 25.
Remote Sens. 2019, 11, 124 23 of 28
Remote Sens. 2019, 11, x FOR PEER REVIEW 23 of 28 
 
 
(a) 
 
(b) 
 
(c) 
Figure 24. The differences between referencing image and other georeferenced images of the first 
image. (a) The differences between Ref-Img and Img-C++. (b) The differences between Ref-Img and 
Img-MATLAB. (c) The differences between Ref-Img and Img-ENVI. 
 
(a) 
 
(b) 
 
(c) 
Figure 25. The differences between referenced image and other georeferenced images of the second 
image. (a) The differences between Ref-Img and Img-C++. (b) The differences between Ref-Img and 
Img-MATLAB. (c) The differences between Ref-Img and Img-ENVI. 
4.3.3. Resource Occupation Analysis 
Figure 24. The differences between referencing image and other georeferenced images of the first image.
(a) The differences between Ref-Img and Img-C++. (b) The differences between Ref-Img and Img-MATLAB.
(c) The differences between Ref-Img and Img-ENVI.
Remote Sens. 2019, 11, x FOR PEER REVIEW 23 of 28 
 
 
(a) 
 
(b) 
 
(c) 
Figure 24. The differences between referencing image and other georeferenced images of the first 
image. (a) The differences between Ref-Img and Img-C++. (b) The differences between Ref-Img and 
Img-MATLAB. (c) The differences between Ref-Img and Img-ENVI. 
 
(a) 
 
(b) 
 
(c) 
Figure 25. The differences between referenced image and other georeferenced images of the second 
image. (a) The differences between Ref-Img and Img-C++. (b) The differences between Ref-Img and 
Img-MATLAB. (c) The differences between Ref-Img and Img-ENVI. 
4.3.3. Resource Occupation Analysis 
i re 25. e ifferences between referenced image and other georeferenced images of the second image.
(a) The differences between Ref-Img and Img-C++. (b) The differences between Ref-Img and Img-MATLAB.
(c) The differences between Ref-Img and Img-ENVI.
Remote Sens. 2019, 11, 124 24 of 28
As observed from Figure 24, the gray values of Figure 24a,b are 0, which means the proposed
method implemented using FPGA has the same accuracy with the implemented by MATLAB, and Visual
Studio (C++) [55,56]. Figure 24c indicates that the difference image between Ref-Img and Img-ENVI
has many pixels are not same. The mean values of Figure 24a–c are 0, 0, and 0.713 respectively, which are
less than 1 pixel.
As observe from Figure 25, the proposed method has the same accuracy as those implemented
using MATLAB, and Visual Studio (C++), and has a small portion of the gray values are different from
the Img-ENVI. The mean values of the Figure 25a–c are 0, 0, and 0.1265, respectively, which are all less
than 1 pixel.
4.3.3. Resource Occupation Analysis
The resource utilization ration, including the flip-flop (FF), look-up-table (LUT) and DSP48s
of the FPGA for the coordinate transformation method and bilinear interpolation function
are assessed, respectively.
In the coordinate transformation method, 250,656 LUTs, 499,268 registers, and 388 DSP48s are
utilized at rates of 40.96% (250656/612000 = 40.96%), 40.79%, and 37.96%, respectively (see Table 7).
In the bilinear interpolation function, floating-point and 32-bit fixed-point mixed operations are
adopted, which can reduce the resource consumption of an FPGA. Table 8 lists the resources occupied
for the bilinear interpolation scheme. 27,218 LUTs, 45,823 registers, 456 RAM/FIFO, and 267 DSP48s
are utilized at rates of 4.45%, 3.74%, 30.40%, and 7.42%, respectively.
Table 7. The logic unit utilization ratio of the coordinate transformation method.
Parameter Used Available Utilization Ratio (%)
Number of slice LUTs 250,656 612,000 40.96
Number of slice Registers 499,268 1,224,000 40.79
Number of DSP48s 388 3600 37.96
Table 8. The logic unit utilization ratio of the bilinear interpolation method.
Parameter Used Available Utilization Ratio (%)
Number of slice LUTs 27,218 612,000 4.45
Number of slice Registers 45,823 1,224,000 3.74
Number of block RAM/FIFO 456 1500 30.40
Number of DSP48s 267 3600 7.42
4.3.4. Processing Speed Comparison
The processing speed is considered as one of the most important factors for implementation by
an FPGA. Table 9 lists the processing speed implemented by FPGA, Visual Studio (C++) and MATLAB.
The size of the first raw image is 256 × 256 pixels2. After georeferencing, the size of georeferenced
image is 281 × 281 pixels2. The running time of the georeferencing method for the first image
using FPGA, Visual Studio (C++) and MATLAB is 0.13s, 1.06s, and 1.12s, respectively. The size of
the second raw image is 256 × 256 pixels2; after georeferencing, the image size is 285 × 277 pixels2.
The running time of the georeferencing method for the second raw image using FPGA, Visual Studio
(C++) and MATLAB is 0.15s, 1.21s, 1.26s, respectively. To put it simply, the processing speed using
FPGA is 8 times faster than that based on PC computer.
Remote Sens. 2019, 11, 124 25 of 28
Table 9. The consumption speed.
Raw image FPGA (second)f = 100 MHz Visual Studio (C++) (second) MATLAB (second) Size (pixels
2)
bldt_tm.img 0.13 1.06 1.12 281 × 281
tmAtlanta.img 0.15 1.21 1.26 285 × 277
4.3.5. Power Consumption
With the development of technology and the improvement of system performance, low power
consumption has become one of the measurement objectives of on-board system. Resources, speed,
and power consumption are three key factors in FPGA design. To obtain the power consumption,
Vivado software provides a comprehensive methodologies and strategies for power consumption
As observed from Figure 26, the powers of the dynamic and the device are 0.280 W (43%)
and 0.379 W (57%), respectively. The total on-chip power is 0.659 W, which is acceptable in on-board
processing platform.
Remote Sens. 2019, 11, x FOR PEER REVIEW 25 of 28 
 
(57%), respectively. The total on-chip p wer is 0.659W, which is acceptable in on-bo rd processing 
platform. 
 
Figure 26. Power consumption 
5. Conclusion  
This paper presents a novel scheme for on-board georeferencing using FPGA optimized second-
order polynomial equation and bilinear interpolation scheme, which consists of five modules: input 
data, coordinate transformation, bilinear interpolation and output data. The main contributions of 
this paper are as follows. 
First, a comprehensive framework has been developed to optimize the georeferencing algorithm 
based on an FPGA. (1) A floating-point block LU decomposition is used to inverse the matrix. 
Compared with the traditional LU decomposition, the multiplication and division operations are 
reduced by 1.02 and 1.25 times, respectively. The block LU decomposition method can reduce the 
complexity for inverting matrix and speed up the operation. (2) To reduce resource consumption of 
an FPGA, some strategies are adopted in programming, i.e., 32-bit integer and floating-point mixed 
operation and serial-parallel data communication.  
Second, the performances of the proposed algorithm are evaluated by error analysis, gray value 
comparison, resource occupation analysis, processing speed comparison, and power consumption. 
The RMSEs are less than one pixel, and other statistics, such as maximum, minimum, and mean error 
are less than one pixel. The gray values of the georeferenced image when implemented using the 
FPGA have the same accuracy as those implemented using MATLAB and Visual Studio (C++), and 
have a very close accuracy implemented using ENVI software. The processing speed using the 
proposed algorithm is 8 times faster than that based on PC computer. The on-chip power 
consumption is 0.659W. 
Therefore, it can be concluded that the proposed georeferencing algorithm implemented using 
FPGA with second-order polynomial model and bilinear interpolation algorithm can achieve real-
time geographic referencing for remote sensing images. 
 
Author Contributions: G. Zhou conceives and designs the experiments; D. Liu performs the experiments and 
writes the paper; J. Huang offers the help of FPGA implementation; L. S. analyzes the data; R. Zhang and X. 
Zhou revise the manuscript. X. C. improves the English language and style. 
Funding: This work was supported by grants from the Natural Science Foundation of China (grant No. 41431179, 
41601365); GuangXi Key Laboratory for Spatial Information and Geomatics Program (Contract No.17-259-16-
09); GuangXi innovative Development Grand under Grant 2018AA13005; GuangXi Key Research and 
development Program of China under Grant number 2016YFB502501; GuangXi innovative Development Grand 
Grant, (the grant number: GuiKe AA18118038). The State Oceanic Administration under Grant number 
[2014]#58; GuangXi Natural Science Foundation under grant number 2015GXNSFDA139032; and Guangxi Key 
Laboratory of Spatial Information and Geomatics Program under grant numbers 15-140-07-01 and 16-380-25-12. 
Figure 26. Power consumption
5. Conclusions
This paper presents a novel scheme for on-board georeferencing using FPGA optimized
second-order polynomial equation and bilinear interpolation scheme, which consists of five modules:
input data, coordinate transformation, bilinear interpolation and output data. The main contributions
of this paper are as follows.
First, a comprehensive framework has been developed to optimize the georeferencing algorithm
based on an FPGA. (1) A floating-point block LU decomposition is used to inverse the matrix.
Compared with the traditional LU decomposition, the multiplication and division operations are
reduced by 1.02 and 1.25 times, respectively. The block LU decomposition method can reduce
the complexity for inverting matrix and speed up the operation. (2) To reduce resource consumption
of an FPGA, some strategies are adopted in programming, i.e., 32-bit integer and floating-point mixed
operation and serial-parallel data communication.
Second, the performances of the proposed algorithm are evaluated by error analysis, gray value
comparison, resource occupation analysis, processing speed comparison, and power consumption.
The RMSEs are less than one pixel, and other statistics, such as maximum, minimum, and mean error
are less than one pixel. The gray values of the georeferenced image when implemented using
the FPGA have the same accuracy as those implemented using MATLAB and Visual Studio (C++),
and have a very close accuracy implemented using ENVI software. The processing speed using
the proposed algorithm is 8 times faster than that based on PC computer. The on-chip power
consumption is 0.659 W.
Total On-chip Power, 0.659 W 
Junction Temperature: 25.6 ·c 
Thermal Margin: 59.4 ·c (67.5 W) 
Effective i9JA; 0.8 'C/W 
Power supplied to off-chip devices: 0 W 
Confidence level: Low 
On-Chip Power 
■ Clocks: 
■ Signals: 
■ Logic: 
■ DRAM: 
■ DSP: 
■ !IQ; 
0.280W(43%) 
0.098W (35%) 
0.042W (15%) 
0.026W (9%) 
o.ooow (<!%) 
0.070W (25%) 
0.045W (15%) 
0.379W (57%) 
Remote Sens. 2019, 11, 124 26 of 28
Therefore, it can be concluded that the proposed georeferencing algorithm implemented using
FPGA with second-order polynomial model and bilinear interpolation algorithm can achieve real-time
geographic referencing for remote sensing images.
Author Contributions: G.Z. conceives and designs the experiments; D.L. performs the experiments and writes
the paper; J.H. offers the help of FPGA implementation; L.S. analyzes the data; R.Z. and X.Z. revise the manuscript.
X.C. improves the English language and style.
Funding: This work was supported by grants from the Natural Science Foundation of China (grant No. 41431179,
41601365); GuangXi Key Laboratory for Spatial Information and Geomatics Program (Contract No.17-259-16-09);
GuangXi innovative Development Grand under Grant 2018AA13005; GuangXi Key Research and development
Program of China under Grant number 2016YFB502501; GuangXi innovative Development Grand Grant, (the grant
number: GuiKe AA18118038). The State Oceanic Administration under Grant number [2014]#58; GuangXi
Natural Science Foundation under grant number 2015GXNSFDA139032; and Guangxi Key Laboratory of Spatial
Information and Geomatics Program under grant numbers 15-140-07-01 and 16-380-25-12.
Acknowledgments: The author would like to thank the reviewers for their constructive comments and suggestions.
Conflicts of Interest: The authors declare no conflict of interest.
References
1. Joyce, K.E.; Belliss, S.E.; Samsonov, S.V.; Mcneill, S.J.; Glassey, P.J. A review of the status of satellite remote
sensing and image processing techniques for mapping natural hazards and disasters. Prog. Phys. Geogr.
2009, 33, 183–207. [CrossRef]
2. Tralli, D.M.; Blom, R.G.; Zlotnicki, V.; Donnella, A.; Evans, D.L. Satellite remote sensing of earthquake,
volcano, flood, landslide and coastal inundation hazards. J. Photogr. Remote Sens. 2005, 59, 185–198.
[CrossRef]
3. Sanyal, J.; Lu, X.X. Application of remote sensing in flood management with special reference to monsoon
Asia: A review. Nat. Hazards 2004, 33, 283–301. [CrossRef]
4. Zhou, G. Near real-time orthorectificatoin and nosaic of small UAV-based video flow for time-critical event
response. IEEE Trans. Geosci. Remote Sens. 2009, 47, 739–747. [CrossRef]
5. Zhou, G.; Zhang, R.; Liu, N.; Huang, J.; Zhou, X. On-board ortho-rectification for images based on an FPGA.
Remote Sens. 2017, 9, 874. [CrossRef]
6. González, G.; Bernabé, S.; Mozos, D.; Plaza, A. FPGA Implementation of an algorithm for automatically
detecting targets in remotely Sensed hyperspectral images. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens.
2016, 9, 4334–4343. [CrossRef]
7. Qi, B.; Shi, H.; Zhuang, Y.; Chen, H.; Chen, L. On-board, real-time preprocessing system for optical
remote-sensing imagery. Sensors 2018, 18, 1328. [CrossRef] [PubMed]
8. Dawood, A.S.; Visser, S.J.; Williams, J.A. Reconfigurable FPGAs for real time image processing in space.
In Proceedings of the 2002 14th International Conference on Digital Signal Processing Proceedings,
DSP 2002 (Cat. No. 02TH8628), Santorini, Greece, 1–3 July 2002; pp. 845–848.
9. Zhang, Y.; Kerle, N. Satellite remote sensing for near-real time data collection. In Geospatial Information
Technology for Emergency Response; CRC Press: Taylor & Francis, London, UK, 2008; pp. 91–118.
10. Zhou, G.; Chen, W.; Kelmelis, J.A.; Zhang, D. A comprehensive study on urban true orthorectification.
IEEE Trans. Geosci. Remote Sens. 2005, 43, 2138–2147. [CrossRef]
11. Zhou, G. Geo-Referencing of video flow from small low-cost civilian UAV. IEEE Trans. Autom. Eng. Sci. 2010,
7, 156–166. [CrossRef]
12. Ziboon, A.R.T.; Mohammed, I.H. Accuracy assessment of 2D and 3D geometric correction models for
different topography in Iraq. Eng. Technol. J. Part A Eng. 2013, 31, 2076–2085.
13. Kartal, H.; Sertel, E.; Alganci, U. Comperative analysis of different geometric correction methods for very
high resolution pleiades images. In Proceedings of the 2017 8th International Conference on Recent Advances
in Space Technologies (RAST), Istanbul, Turkey, 19–22 June 2017; pp. 171–175.
14. Wang, T.; Zhang, G.; Li, D.; Tang, X.; Pan, H.; Zhu, X.; Chen, C. Geometric accuracy validation for ZY-3
satellite imagery. IEEE Geosci. Remote Sens. Lett. 2014, 11, 1168–1171. [CrossRef]
Remote Sens. 2019, 11, 124 27 of 28
15. Chen, J.; Joang, T.; Lu, W.; Han, M. The geometric correction and accuracy assessment based on Cartosat-1
satellite image. In Proceedings of the 2010 3rd International Congress on Image and Signal Processing,
Yantai, China, 16–18 October 2010; pp. 1253–1257.
16. Lee, C.A.; Gasster, S.D.; Plaza, A.; Chang, C.I.; Huang, B. Recent developments in high performance
computing for remote sensing: A review. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2011, 4, 508–527.
[CrossRef]
17. Plaza, A.; Valencia, D.; Plaza, J.; Martinez, P. Commodity cluster-based parallel processing of hyperspectral
imagery. J. Parallel Distrib. Comput. 2006, 66, 345–358. [CrossRef]
18. Plaza, A.; Du, Q.; Chang, Y.L.; King, R.L. High performance computing for hyperspectral remote sensing.
IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2011, 4, 528–544. [CrossRef]
19. Fang, L.; Wang, M.; Li, D.; Pan, J. CPU/GPU near real-time preprocessing for ZY-3 satellite images:
Relative radiometric correction, MTF compensation, and geocorrection. ISPRS J. Photogr. Remote Sens.
2014, 87, 229–240. [CrossRef]
20. Van der Jeught, S.; Buytaert, J.A.N.; Dirckx, J.J. Real-time geometric lens distortion correction using a graphics
processing unit. Opt. Eng. 2012, 51. [CrossRef]
21. Thomas, O.; Trym, V.H.; Ingebrigt, W.; Ingebrigt, W. Real-time georeferencing for an airborne hyperspectral
imaging system. Algorithms Technol. Multispectral Hyperspectral Ultraspectral Imagery XVII 2011, 8048.
[CrossRef]
22. Reguera-Salgado, J.; Calvino-Cancela, M.; Martin-Herrero, J. GPU geocorrection for airborne pushbroom
imagers. IEEE Trans. Geosci. Remote Sens. 2012, 50, 4409–4419. [CrossRef]
23. López-Fandiño, J.; Barrius, P.Q.; Heras, D.B.; Argüello, F. Efficient ELM-based techniques for the classification
of hyperspectral remote sensing images on Commodity GPUs. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens.
2015, 8, 2884–2893. [CrossRef]
24. Lu, J.; Zhang, B.; Gong, Z.; Li, E.; Liu, H. The remote-sensing image fusion based on GPU. In Proceedings
of the International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences,
Beijing, China, 23 October 2008; pp. 1233–1238.
25. Zhu, H.; Cao, Y.; Zhou, Z.; Gong, M. Parallel multi-temporal remote sensing image change detection on GPU.
In Proceedings of the IEEE International Parallel and Distributed Processing Symposium Workshops & PhD
Forum, IEEE Computer Society, Shanghai, China, 21–25 May 2012; pp. 1898–1904.
26. Ma, Y.; Chen, L.; Liu, P.; Lu, K. Parallel programing templates for remote sensing image processing on GPU
architectures: Design and implementation. Computing 2016, 98, 7–33. [CrossRef]
27. Lopez, S.; Vladimirova, T.; Gonzalez, G.; Resano, J.; Mozos, D. The promise of reconfigurable romputing for
hyperspectral imaging onboard systems: A review and trends. Proc. IEEE 2013, 101, 698–722. [CrossRef]
28. Zhou, G.; Baysal, O.; Kaye, J.; Habib, S.; Wang, C. Concept design of future intelligent earth observing
satellites. Int. J. Remote Sens. 2004, 25, 2667–2685. [CrossRef]
29. Huang, J.; Zhou, G.; Zhang, D.; Zhang, G.; Zhang, R.; Baysal, O. An FPGA-based implementation of corner
detection and matching with outlier rejection. Int. J. Remote Sens. 2018, 1–20. [CrossRef]
30. Pakartipangi, W.; Darlis, D.; Syihabuddin, B.; Wijanto, H. Analysis of camera array on board
data handling using FPGA for nano-satellite application. In Proceedings of the International Conference on
Telecommunication Systems Services and Applications, Bandung, Indonesia, 25–26 November 2015; pp. 1–6.
31. Huang, J.; Zhou, G.; Zhou, X. A new FPGA architecture of fast and brief algorithm for on-board corner
detection and matching. Sensors 2018, 18, 1014. [CrossRef] [PubMed]
32. Yu, G.; Vladimirova, T.; Sweeting, M.N. Image compression systems on board satellites. Acta Astronautica
2009, 64, 988–1005. [CrossRef]
33. Long, T.; Jiao, W.; He, G.; Zhang, Z. A fast and reliable matching method for automated georeferencing of
remotely-sensed imagery. Remote Sens. 2016, 8, 56. [CrossRef]
34. Williams, J.A.; Dawood, A.S.; Visser, S.J. FPGA-based cloud detection for real-time onboard remote sensing.
In Proceedings of the 2002 IEEE International Conference on Field-Programmable Technology (FPT),
Hong Kong, China, 16–18 December 2002; pp. 110–116.
35. González, C.; Mozos, D.; Resano, J.; Plaza, A. FPGA implementation of the N-FINDR algorithm for remotely
sensed hyperspectral image analysis. IEEE Trans. Geosci. Remote Sens. 2012, 50, 374–388. [CrossRef]
36. Swann, R.; Hawkins, D.; Westwellroper, A.; Johnstone, W. The potential for automated mapping from
geocoded digital image data. Photogr. Eng. Remote Sens. 1988, 54, 187–193.
Remote Sens. 2019, 11, 124 28 of 28
37. Savoy, F.M.; Dev, S.; Lee, Y.H.; Winkler, S. Geo-referencing and stereo calibration of ground-based Whole
Sky Imagers using the sun trajectory. In Proceedings of the 2016 IEEE International Geoscience and Remote
Sensing Symposium (IGARSS), Beijing, China, 10–15 July 2016; pp. 7473–7476.
38. Jensen, J.R.; Lulla, K. Introductory Digital image processing-A remote sensing perspective.
Environ. Eng. Geosci. 2007, 13, 89–90. [CrossRef]
39. Chen, L.C.; Teo, T.A.; Liu, C.L. The geometrical comparisons of RSM and RFM for FORMOSAT-2 satellite
images. Photogr. Eng. Remote Sens. 2006, 72, 573–579. [CrossRef]
40. Bannari, A.; Morin, D.; Bénié, G.B.; Bonn, F.J. A theoretical review of different mathematical models of
geometric corrections applied to remote sensing images. Remote Sens. Rev. 1995, 13, 27–47. [CrossRef]
41. Toutin, T. Geometric processing of remote sensing images: Models, algorithms and methods. Int. J.
Remote Sens. 2004, 25, 1893–1924. [CrossRef]
42. Raffa, M.; Mercogliano, P.; Galdi, C. Georeferencing raster maps using vector data: A meteorological
application. In Proceedings of the 2016 IEEE Metrology for Aerospace (MetroAeroSpace), Florence, Italy,
22–23 June 2016; pp. 102–107.
43. Zhou, G.; Ron, L. Accuracy evaluation of ground points from IKONOS high-resolution satellite imagery.
Photogr. Eng. Remote Sens. 2000, 66, 1103–1112.
44. Zhou, G.; Yue, T.; Shi, Y.; Zhang, R.; Huang, J. Second-order polynomial equation-based block adjustment for
orthorectification of DISP imagery. Remote Sens. 2016, 8, 680. [CrossRef]
45. Shlien, S. Geometric correction, registration, and resampling of Landsat imagery. Can. J. Remote Sens. 1979,
5, 74–89. [CrossRef]
46. Gribbon, K.T.; Bailey, D.G. A novel approach to real-time bilinear interpolation. In Proceedings of
the DELTA 2004 Second IEEE International Workshop on Electronic Design, Test and Applications,
Perth, WA, Australia, 28–30 January 2004; pp. 126–131.
47. Bailey, D.G. Design for Embedded Image Processing on FPGAs; John Wiley & Sons (Asia) Pte Ltd.: solaris south
tower, Singapore, 2011; pp. 275–305. ISBN 978-0-470-82849-6.
48. Huang, J. FPGA-Based Optimization and Hardware Implementation of P-H Method for Satellite Relative
Attitude and Absolute Attitude Solution. Ph.D. Thesis, Tianjin University, Tianjin, China, 2019.
49. Zhou, G.; Huang, J.; Shu, L. An FPGA-based P-H method on-board solution for satellite relative altitude.
Geomat. Inf. Sci. Wuhan University. 2018, 43, 1–9. [CrossRef]
50. Daga, V.; Govindu, G.; Prasanna, V.; Gangadharapalli, S.; Sridhar, V. Efficient floating-point based block LU
decomposition on FPGAs. In Proceedings of the International Conference on Engineering of Reconfigurable
Systems and Algorithms (Ersa’04), Las Vegas, NV, USA, 21–24 June 2004; pp. 276–279.
51. Gill, T.; Collett, L.; Armston, J.; Eustace, A.; Danaher, T.; Scarth, P.; Flood, N.; Phinn, S. Geometric correction
and accuracy assessment of landsat-7 etm+ and landsat-5 tm imagery used for vegetation cover monitoring
in queensland, Australia from 1988 to 2007. Surveyor 2010, 55, 273–287. [CrossRef]
52. Chen, J.; Ji, K.; Shi, Z.; Liu, W. Implementation of block algorithm for LU factorization. In Proceedings of
the 2009 WRI World Congress on Computer Science and Information Engineering, Los Angeles, CA, USA,
31 March–2 April 2009; pp. 569–573.
53. Richards, J.A.; Richards, J.A. Remote Sensing Digital Image Analysis; Springer: Berlin/Heidelberg, Germany,
1999; pp. 39–74. ISBN 978-3-642-30062-2.
54. Schowengerdt, R.A. Chapter 7-correction and calibration. In Remote Sensing, 3rd ed.; Academic Press:
Cambridgem, MA, USA, 2007; p. 285-XXII. ISBN 978-0-12-369407-2.
55. French, J.C.; Balster, E.J.; Turri, W.F. A 64-bit orthorectification algorithm using fixed-point arithmetic.
High-Perform. Comput. Remote Sens. 2013, 8895. [CrossRef]
56. Shaffer, D.A. An FPGA Implementation of Large-Scale Image Orthorectification. Ph.D. Thesis,
University of Dayton, Dayton, OH, USA, 2018.
© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access
article distributed under the terms and conditions of the Creative Commons Attribution
(CC BY) license (http://creativecommons.org/licenses/by/4.0/).
