Experiment Design Regularization-Based Hardware/Software Codesign for Real-Time Enhanced Imaging in Uncertain Remote Sensing Environment by A. Castillo Atoche et al.
Hindawi Publishing Corporation
EURASIP Journal on Advances in Signal Processing
Volume 2010, Article ID 254040, 21 pages
doi:10.1155/2010/254040
Research Article
Experiment Design Regularization-Based Hardware/Software
Codesign for Real-Time Enhanced Imaging in Uncertain Remote
Sensing Environment
A. Castillo Atoche,1, 2 D. Torres Roman,1 and Y. Shkvarko1
1Cinvestav del IPN, Unidad Guadalajara, Guadalajara, Avenida Cient´ıfica # 1145, Colonia El Baj´ıo, Zapopan, Jalisco,
C.P. 45015, Mexico
2Universidad Autonoma de Yucatan, Av. Industrias No Contaminantes s/n, Col. Cordemex. Merida, Mexico
Correspondence should be addressed to A. Castillo Atoche, acastill@uady.mx
Received 14 July 2009; Revised 6 November 2009; Accepted 13 January 2010
Academic Editor: Bernhard Wess
Copyright © 2010 A. Castillo Atoche et al. This is an open access article distributed under the Creative Commons Attribution
License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly
cited.
A new aggregated Hardware/Software (HW/SW) codesign approach to optimization of the digital signal processing techniques for
enhanced imaging with real-world uncertain remote sensing (RS) data based on the concept of descriptive experiment design
regularization (DEDR) is addressed. We consider the applications of the developed approach to typical single-look synthetic
aperture radar (SAR) imaging systems operating in the real-world uncertain RS scenarios. The software design is aimed at the
algorithmic-level decrease of the computational load of the large-scale SAR image enhancement tasks. The innovative algorithmic
idea is to incorporate into the DEDR-optimized fixed-point iterative reconstruction/enhancement procedure the convex
convergence enforcement regularization via constructing the proper multilevel projections onto convex sets (POCS) in the solution
domain. The hardware design is performed via systolic array computing based on a Xilinx Field Programmable Gate Array (FPGA)
XC4VSX35-10ﬀ668 and is aimed at implementing the unified DEDR-POCS image enhancement/reconstruction procedures in a
computationally eﬃcient multi-level parallel fashion that meets the (near) real-time image processing requirements. Finally, we
comment on the simulation results indicative of the significantly increased performance eﬃciency both in resolution enhancement
and in computational complexity reduction metrics gained with the proposed aggregated HW/SW co-design approach.
1. Introduction
In this paper, we address a new aggregated Hardware/
Software (HW/SW) codesign approach to optimization of
the digital signal and image processing techniques as required
for enhanced remote sensing (RS) of the environment with
the use of high-resolution array radar and synthetic aperture
radar (SAR) systems. At the algorithm-design level, the
RS imaging problem is treated as an ill-posed nonlinear
inverse problem of reconstruction of the spatial spectrum
pattern (SSP) of the backscattered field distributed over
the remotely sensed scene via processing the SAR data
signals distorted in the uncertain stochastic measurement
channel [1–6]. The operational scenario uncertainties are
attributed to inevitable random signal perturbations in
inhomogeneous propagation medium [1, 2, 4], possible
imperfect radar/SAR system calibration [1, 3], and SAR
carrier trajectory deviations [3, 5, 6]. The unified approach
that we address to solve such a problem is based on the
recently proposed concept of descriptive experiment design
regularization (DEDR) [7, 8]. The general DEDR method
constructed in [7, 8] incorporates into the minimum risk
(MR) nonparametric estimation strategy [4] the experiment
design-motivated constraints of the image identifiably for
the discrete-form signal formation operator (SFO) specified
by the employed signal modulation format [4–6]. On one
hand, a considerable advantage of such DEDR paradigm
relates to its flexibility in designing the desirable error
metrics in the corresponding image representation space via
defining diﬀerent descriptive cost functions [7, 9]. On the
other hand, the crucial limitations of the DEDR method
relate to the necessity of performing simultaneously the
2 EURASIP Journal on Advances in Signal Processing
solution-dependent SFO inversion operations and adaptive
adjustments of the degrees of freedom of the overall DEDR
image enhancement techniques ruled by the employed fixed-
point iterative process [8]. For the real-world large-scale
RS scenes, such adaptive full-format DEDR-optimal method
turns out to be computationally extremely consuming,
therefore cannot be recommended as a practical technique
realizable in (near) real time [10]. The innovative idea of
this paper is to aggregate the DEDR-optimal fixed-point
iterative reconstruction/enhancement procedures developed
in the previous studies [7, 8, 10] with the multi-level
robustness and convergence enforcing regularization via
constructing the proper projections onto convex sets (POCS)
in the solution domain. The established POCS-regularized
iterative DEDR technique is performed separately along
the range and azimuth directions over the scene frame
making an optimal use of the range-azimuth sparseness
properties of the employed radar/SAR modulation format.
Thus, at the SW codesign stage, we address two conceptually
innovative propositions that distinguish our approach from
the previous studies [7, 10]. First, two possible observation
scenarios (instead of one) are unified now under the DEDR
paradigm for the HW/SW codesign, namely, (i) regular case
without model uncertaintiess and (ii) uncertain scenario
with random perturbations in the SFO. Second, the POCS
regularization is proposed to be performed in an aggregated
multi-level fashion to make the optimal use of the non-
trivial RS system model information for constructing the
corresponding robustness and convergence enforcing POCS
operators. In particular, the positivity and range-azimuth
orthogonalization projectors of [10] are aggregated with the
point spread function (PSF) sparseness enforcing sliding
window projectors acting in parallel over both range and
azimuth image frames that set the corresponding PSF pixel
values to zeroes outside their specified support regions.
Such aggregated POCS regularization drastically speeds up
the resulting fixed-point iterative DEDR techniques making
them exactly well fitted for the systolic computational
implementation form; that is, provides the SW algorithmic
base for the further HW codesign level of the problem
treatment.
At the HW codesign stage, we propose to pursue the
System-on-Chip (SoC) single Field Programmable (FP)
unit integration approach [9–14], which allows eﬃcient
coupling/integration of a number of predetermined complex
components. Such a programmable unit is a viable solution
for rapid prototyping and digital implementation of the
radar/SAR image enhancement techniques developed at the
SW codesign stage, in spite of designing the process in a
common personal computer (PC) [11–14]. The main advan-
tage of the proposed FPSoC platform is that all required
component designs, including the embedded processor unit,
memory, and peripherals are algorithmically “adapted” for
the particular developed POCS-regularized iterative fixed-
point DEDR image enhancement techniques. Therefore,
at the HW design stage, the novel contribution of this
study is twofold: first, the addressed HW/SW codesign
methodology is aimed at an HW implementation of the
developed software using systolic arrays as coprocessors units;
second, the proposed systolic-based processing architecture
is particularly adapted for computational implementation of
the unified DEDR-POCS techniques in a computationally
eﬃcient fashion that meets the (near) real-time overall RS
imaging system requirements. We resume this study with
the analysis of the simulation results related to enhancement
of the real-world degraded large-scale SAR imagery (i.e.,
acquired in uncertain operational scenarios) indicative of the
significantly increased reconstruction eﬃciency gained with
the proposed HW/SW codesign approach.
2. Background
2.1. Continuous-Form Problem Model. The general formal-
ism of the RS imaging problem presented in this paper is a
structural extension of the problem considered in [4, 7, 8],
hence some crucial model elements are repeated for conve-
nience to the reader. Consider a coherent RS experiment in
a random medium and the narrowband SAR assumption [1]
that enables us to model the extended object backscattered
wavefield in the baseband format [3] by imposing its time
invariant complex scattering (backscattering) function e(x)
in the object image domain (scattering surface) X  x. The
measurement radar/SAR data field u(y) = s(y)+n(y) consists
of the echo signals s and additive noise n and is available
for observations and recordings within the prescribed time-
space observation domain Y = T × P, where y = (t, ρ)T
defines the time(t)-space(ρ) points in Y ; t ∈ T , ρ ∈ P; y ∈
Y. The model of the data field u is defined by specifying
the stochastic equation of observation (EO) that in the













































The random kernel S˜(y, x) of the perturbed random
signal formation operator (SFO) S˜ given by (1) defines the
signal field formation model. Its mean S(y, x) is referred to as
the nominal modulation law in the data formation channel
defined by the time-space modulation of signals employed in
a particular imaging radar/SAR system [3], and the variation
about the mean ΔS(y, x) = μ(y, x)S(y, x) models the stochas-
tic perturbations of the wavefield at diﬀerent propagation
paths, where μ(y, x) represents the zero-mean multiplicative
noise that models random propagation perturbations in the
medium (the so-called general Rytov model [3, 5, 6]). The
fields e,n,u in (1) are assumed to be zero-mean complex
valued Gaussian random fields [3]. Next, we assume an
incoherent nature of the backscattered field e(x). This is
naturally inherent to the RS experiments [1, 3, 5, 6] and
leads to the δ-form of the object field correlation function,
Re(x1, x2) = b(x1)δ(x1−x2), where e(x) and b(x) = 〈|e(x)|2〉
are referred to as the object random complex scattering
function and its average power scattering function or spatial
spectrum pattern (SSP), respectively.
EURASIP Journal on Advances in Signal Processing 3
The problem of enhanced RS imaging is to develop
a signal processing method for performing the eﬃcient
estimation of the SSP b(x) by processing the available
radar/SAR measurements of the data wavefield u(y). Such
estimate b̂(x) of the SSP b(x) is referred to as the desired
reconstructed RS image of the remotely sensed scene.
2.2. Discrete-Form Problem Model. Now we proceed from
the stochastic integral-form EO (1) to its finite-dimensional
discrete (vector) form approximation [8]
u = S˜e + n = Se + Δe + n, (2)
in which the disturbed SFO matrix
S˜ = S + Δ (3)
is the discrete-form approximation of the integral SFO
S˜ defined by the EO (1), and e, n, u represent zero-
mean vectors composed of the decomposition (sampling)
coeﬃcients {ek; k = 1, . . . ,K}, {nm;m = 1, . . . ,M}, and
{um;m = 1, . . . ,M}, respectively [7]. These vectors are
characterized by the correlation matrices: Re = D = D(b) =
diag(b) (a diagonal matrix with vector b at its principal
diagonal), Rn, and Ru = 〈S˜ReS˜+〉p(Δ) + Rn, respectively,
where 〈· 〉p(Δ) defines the averaging performed over the
randomness of Δ characterized by the probability density
function p(Δ) unknown to the observer, and superscript +
stands for Hermitian conjugate (conjugate transpose). Vector
b is composed of the elements bk = 〈ekek∗〉 = 〈|ek|2〉;
k = 1, . . . ,K , and is referred to as a K-D vector-form
representation of the SSP. The SSP vector b is associated with
the so-called lexicographically ordered image pixels [7, 9].
The corresponding conventional Ky ×Kx rectangular frame-
ordered scene image B = {b(kx, ky); kx = 1, . . . ,Kx; ky =
1, . . . ,Ky} relates to its lexicographically ordered vector-form
representation b = {bk = b(k); k = 1, . . . ,K = Ky × Kx} via
the standard row by row expansion (so-called lexicographical
reordering) procedure, B = L{b} [9]. Note that in the
simple case of a certain operational scenario [1, 3, 7], the
discrete-form (i.e., matrix-form) SFO S is assumed to be
deterministic, in which case the random perturbation term
in (3) is irrelevant, Δ = 0.
The digital enhanced RS imaging problem is formally
stated as follows: to reconstruct the scene pixel frame
image B̂ via lexicographical reordering B̂ = L{b̂} of the
SSP vector estimate b̂ estimated from whatever available
discrete measurements of the recorded radar/SAR data u.
The reconstructed SSP vector b̂ is an estimate of the second-
order statistics of the scattering vector e observed through
the perturbed SFO (3) contaminated with additive noise n
and corrupted also with the signal-dependent multiplicative
noise Δe, hence, the enhanced RS imaging problem at hand
must be qualified and treated as a statistical nonlinear inverse
problem with model uncertainties. The high-resolution sens-
ing implies formation of the RS image B̂ based on some
statistically optimal solution of such an inverse problem
robust against the problem model uncertainties. In this paper
we propose to unify the POCS regularization with the DEDR
method originally developed in [7, 8].
3. Unified DEDR Method
3.1. DEDR Strategy for Certain Operational Scenario. In the
descriptive statistical formalism, the desired SSP vector b̂ is
recognized to be the vector of a principal diagonal of the
estimate of the correlation matrix Re(b); that is, b̂ = {R̂e}diag.
Thus one can seek to estimate b̂ = {R̂e}diag given the data
correlation matrix Ru preestimated empirically via averaging
J ≥ 1 recorded data vector snapshots {u( j)}; for example, [7]






















where {·}diag defines the vector composed of the principal
diagonal of the embraced matrix.
To optimize the search for F in the certain operational












that implies the minimization of the weighted sum of the
systematic and fluctuation errors in the desired estimate
b̂ where the selection (adjustment) of the regularization
parameter α and the weight matrix A provide the additional
experiment design degrees of freedom incorporating any
descriptive properties of a solution if those are known a
priori [3, 7]. It is easy to recognize that the strategy (6)
is a structural extension of the statistical minimum risk
estimation strategy [4] for the nonlinear spectral estimation
problem at hand because in both cases the balance between
the gained spatial resolution and the noise suppression in the
resulting estimate is to be optimized.
3.2. Extended DEDR Strategy for Uncertain Scenario. To
optimize the search for the desired SO F in the uncertain
operational scenario with the randomly perturbed SFO (3),
the extended DEDR strategy was proposed in [8] as











4 EURASIP Journal on Advances in Signal Processing
where the conditioning term (9) represents the worst-
case statistical performance (WCSP) regularizing constraint
imposed on the unknown second-order statistics 〈‖Δ‖2〉p(Δ)
of the random distortion component Δ of the SFO matrix

















where the regularization parameter α and the metrics
inducing weight matrix A compose the processing level
“degrees of freedom” of the DEDR method.
To proceed with the derivation of the robust SO (8),
in [8], the risk function (10) was next decomposed and
evaluated for its the maximum value applying the Cauchy-
Schwarz inequality and Loewner ordering [9] of the weight
matrix A ≤ γI with the scaled Loewner ordering factor
γ = min{γ : A ≤ γI}. With these robustifications [8],



















= (Rn + βI
)




3.3. DEDR-Optimal Solution Operators. Examining the
DEDR strategies (6) and (11) one can deduce that those both
are structurally similar and diﬀer only by the definition of the
second (i.e., noise) risk component terms in (7) and (12). In
the certain operational scenario [5–7], the trace {FRnF+}for
the noise error measure is used, while in the uncertain
scenario [8] the augmented measure tr{FRΣF+} is employed
with the diagonal loaded extension (13) of the composite
noise correlation matrix RΣ. The established structural
similarity (the so-called problem model homomorphism
[5, 6]) of two DEDR problems (6) and (11) makes it possible
to unify the solutions for both scenarios. Doing so, we specify
the SOs for both considered operational scenarios, namely:
(1) SO for certain operational scenario follows directly
from the solution to the optimization problem (6) found in
the previous study [7] that results in
F = KS+R−1n , (14)
where
K = (S+R−1n S + αA−1)−1 (15)
represents the so-called regularized reconstruction operator
[7]; R−1n is the noise whitening filter, and the adjoint (i.e.,
Hermitian transpose) SFO S+ defines the matched spatial
filter in the conventional signal processing terminology [1,
3];
(2) SO for uncertain operational scenario follows as
structural extension of (14) for the augmented (diagonal
loaded) RΣ that yields [8]
F = KΣS+R−1Σ , (16)
where
KΣ = (S+R−1Σ S + αA−1)−1 (17)
represents the robustified reconstruction operator for the
uncertain scenario.
3.4. DEDR-Related Imaging Techniques. Here we exemplify
three practically motivated DEDR-related imaging tech-
niques [7, 8], that will be used at the HW codesign stage,
namely, the conventional matched spatial filtering (MSF)
method, and two high-resolution reconstructive imaging
techniques: (i) the robust spatial filtering (RSF), and (ii) the
robust adaptive spatial filtering (RASF) methods.
(1) MSF. The MSF algorithm is a member of the DEDR-
related family [7] specified for α ‖S+S‖, that is, the
case of a dominating priority of suppression of noise
over the systematic error in the optimization problem
(6). In this case, the SO (14) is approximated by the
matched spatial filter (MSF) [7]:
FMSF = F(1) ≈ S+. (18)
(2) RSF. The RSF method implies no preference to any
prior model information (i.e., A−1 =I) and balanced
minimization of the systematic and noise error
measures in (9), (11) by adjusting the regularization
parameter α to the inverse of the signal-to-noise ratio
(SNR). In that case the SO becomes the Tikhonov-
type robust spatial filter (RSF) [7]:
FRSF = F(2)
= (S+S + αRSFI)−1S+,
(19)
in which the RSF regularization parameter αRSF is
adjusted to a particular operational scenario model,
namely, αRSF = (N0/b0) for the case of a certain
operational scenario [7], and αRSF = (NΣ/b0) in the
uncertain operational scenario case [8], respectively,
where N0 represents the white observation noise
power density, b0 is the average a priori SSP value,
andNΣ = N0 + β corresponds to the augmented noise
power density in the correlation matrix specified by
(13).
EURASIP Journal on Advances in Signal Processing 5
(3) RASF. In the Bayesian statistically optimal problem
treatment, α and A are adjusted in an adaptive fashion
following the Bayesian minimum risk strategy [8],
that is, αA−1 = D̂ = diag(b̂), the diagonal matrix
with the estimate b̂ at its principal diagonal, in
which case the SOs (14), (16) become itself solution-
dependent operators that result in the following


















for the uncertain operational scenario [8], respectively. Next,
in all practical RS scenarios [1–3] (and specifically, in SAR
uncertain imaging applications [2, 7, 8]), it is a common
practice to accept the robust white additive noise model,
that is Rn = N0I, attributing the unknown correlated noise
component as well as multiplicative speckle noise to the
composite uncertain noise term Δe in (2), in which case RΣ =
NΣI with the composite noise power density NΣ = N0 + β,
the initial observation noise variance N0 augmented by the
loading factor β specified by (13).
Using the defined above SOs, the DEDR-related data pro-
cessing techniques in the conventional pixel-frame format











; p = 1, 2, 3, 4
(22)
with F(1) = FMSF; F(2) = FRSF, and F(3) = FRASF, F(4) =
FRASFΣ, respectively. Any other feasible adjustments of the
DEDR degrees of freedom (the regularization parameters
α, β, and the weight matrix A) provide other possible
DEDR-related SSP reconstruction techniques, that we do not
consider in this paper.
4. POCS Regularized DEDR Method
Because of the extremely high dimension (K × K) = (Ky ×
Kx) × (Ky × Kx) ∼ 1012 of the operator inversions required
to form the corresponding SOs specified by (20), (21),
it is questionable to recommend the general-form DEDR-
optimal method (22) as a practical enhanced RS imaging
technique realizable in (near) real computational time.
Hence, one has to proceed from the conventional-form (Ky×
Kx)×(Ky×Kx)-dimensional RSF and RASF algorithms (that
require cumbersome operator inversions (20)–(22) to more
computationally eﬃcient iterative techniques that do not
involve the large-scale operator inversions and incorporate
the convergence enforcement regularization into the DEDR
procedure via constructing the proper projections onto
convex sets (POCS) in the solution domain. In the consid-
ered here RS imaging applications, such POCS is aimed at
performing the factorization of the overall procedures over
the orthogonal range (y)-azimuth (x) coordinates in the
scene frame making also an optimal use of the sparseness
properties of the employed radar/SAR modulation format.
Thus, the innovative idea is to perform the POCS regular-
ization in an aggregated multi-level fashion. In particular,
we propose to aggregate the positivity and range-azimuth
orthogonalization projectors constructed previously in [10]
with the point spread function (PSF) sparseness enforcing
sliding window projectors acting in parallel over both range
and azimuth image frames that set the corresponding PSF
pixel values to zeroes outside their specified support regions.
In this section, we address such a unified multi-level POCS-
regularized iterative DEDR method as an extension of the
previously proposed single-level DEDR-POCS [10] that we
develop here in two stages.
4.1. First Stage: Fixed-Point Iterative DEDR Algorithm. The
first stage is a structural extension of the fixed-point method
considered in [10], the extension being done for the case
of the unified SOs specified now by (14) and (16). Thus,
following the fixed-point algorithm design scheme of [10,










n = 0, 1, . . ., where P is a convergence enforcing projector
(i.e., the POCS-regularizing operator) that will be construct











represents the self-adjoint reconstruction operator at the ith
iteration step, n = 0, 1, . . ., and
Ψ = S+S (25)
is the nominal system point spread function (PSF) oper-
ator (a K×K matrix). Applying routinely the fixed-point
technique [9, 10] to the equation (23), we derive the
desired extended POCS-regularized iterative SSP estimation
algorithm
b̂[n+1] = P b̂[0] + P T[n]b̂[n]; n = 0, 1, . . . . (26)
Here,










; n = 0, 1, . . .
(27)


















Azimuth PSM· · ·Ψa(Δx)






Figure 1: Illustration of the scene image degradations over the range and azimuth directions with factorized PSMs.








Figure 2: General framework of the DEDR method.
























◦ denotes the Shur-Hadamar (element-by-element) matrix
product, and the zero-step iteration





is formed as an outcome of the MSF algorithm from the
DEDR family (22) specified for the adjoint SFO solution
operator S+.
4.2. B. Second Stage: Multilevel POCS Regularization. Next,
to specify the regularizing POCS projector operator P
in the fixed-point algorithm (26) let us make the use of
factorization of the PSM (25) over the azimuth (x) and
range (y) coordinates valid for all existing imaging radar/SAR
systems [2, 3, 9]. Such factorization is illustrated in Figure 1.
We formalize this stage by introducing the range-azimuth
factorization operator Pa⊥r , the same one as in the previous
POCS regularization considered in [10]. Next, to make
a use of the intrinsic sparseness properties of the SAR
point spread functions over the range and azimuth frames,
we propose to incorporate the new POCS regularization
stage via constructing the x-y factorized projection oper-
ator (algorithm)Pκa⊥κr that acts as a composition of the
orthogonal sliding windows [9] with the window apertures
adjusted to the PSM widths: (i) 2κa specifies the azimuth
window frame adjusted to the eﬀective pixel width of the
non-zero strip Ψa(x) of the azimuth PSM Ψa along the x
axis; (ii) 2κr specifies the range window frame adjusted to
the eﬀective pixel width of the non-zero strip Ψr(y) of the
range PSM Ψr along the y axis, respectively, as illustrated
in Figure 1. Such the sliding window projector Pκa⊥κr is
an easy-to-implement numerical algorithm [9] that simply
sets the pixels values to zero outside the support regions
2κa ⊂ Kx and 2κr ⊂ Ky around every particular pixel
B̂[i](kx ,ky); kx = 1, . . . ,Kx; kv = 1, . . . ,Ky in the rectangular
image frame B̂[n] = L{b̂[n]} separately reconstructed via
(26) along the corresponding x and y axes, respectively. Last,
following [10], to enforce prior knowledge on the intrinsic
positivity of the SSP we impose, in addition to Pa⊥r and
Pκa⊥κr , the positivity operator (algorithm) P+ that has the
EURASIP Journal on Advances in Signal Processing 7
eﬀect of clipping oﬀ all negative values [8]. The defined above
orthogonal projecting window Pκa⊥κr and positivity operator
P+ are projectors onto convex sets, that is POCS operators
[9], thus a composition
P = P+Pκa⊥κrPa⊥r (31)
is a POCS operator as well. While this definition in the terms
of the proposed aggregated projections sounds complicated,
the algorithmic meaning of (31) is very simple and is easily
established in the algorithmic form familiar to the signal
processing and RS communities. Acting on a b̂[n] (that may
be not a member of the convex set at a particular iteration
i), the P applied to b̂[n] produces the member of the convex
cone set composed of non-negative elements that is nearest
to b̂[n] in the sense of minimization of the L2 norm ‖P b̂[n]−
b̂[n]‖ [9, Section 15.4].
Now, the application of the P constructed by (31) to the
iteration process (26) with the corresponding lexicographical
reordering B̂ = L{b̂} yields the desired resulting POCS-













; n = 0, 1, . . . ,
(32)
in which the zero-step iteration B̂[0] = L{b̂[0]} is formed
using the conventional (i.e., low-resolution) MSF imaging
algorithm (30), the aggregated convergence enforcing POCS
regularizing operator is constructed by (31), and the matrix-
form fixed-point iteration operator T[n] is specified by (27).
We address such POCS-regularized DEDR technique
(32) as the unified DEDR-POCS method. Its general frame-
work is presented in Figure 2. Note that the fixed-point
process (32) does not involve the cumbersome operator
inversions (in contrast to the initial DEDR techniques
defined by (5), (22) and, moreover, it is performed separately
along the range (y) and azimuth (x) directions making
an optimal use of the PSM sparseness properties (κa 
Kx, κr  Ky). These features of the POCS-regularized
RSF and RASF algorithms generalized by (32) result in the
drastically decreased algorithmic computational complexity
(e.g, (Kx/κa)× (Kx/κa) ∼ 103 · · · 104 times at each iteration
for the typical large-scale SAR image formats [1, 2]) that we
will verify and analyze in more details further on in Section 6.
4.3. DEDR-POCS Convergence. We accomplish our algo-
rithmic developments at the SW codesign stage with the
analytical analysis of the convergence issues related to
the developed unified DEDR-POCS method. Following
the POCS regularization formalism [9], the convergence
enforcing projectors in the iterated procedure (32) are to be
constructed formally as
P λι = I− λι(Pι − I); ι = 1, 2, . . . ;
P1 = Pa⊥r ; P2 = Pκa⊥κr ; P3 = P+,
(33)
where λι; ι = 1, 2, 3 represent the relaxation (speeding-up)
regularization parameters and I is the identity operator. The
iteration rule (32) for the composed regularizing projectors
(33) becomes
B̂[n+1] = P λ3 P λ2 P λ1 B̂[0]








; i = 0, 1, . . .
(34)
and is guaranteed to converge to the point in the intersection
of the convex sets specified by P λι provided 0 < λι < 2
for all ι = 1, 2, 3 regardless of the initialization B̂[0] that
is a direct sequence of the fundamental theorem of POCS
[9, page 1066]. Note that the employed specifications of the
projectors in (33), that is, P1 = Pa⊥r ; P2 = Pκa⊥κr ; P3 =
P+; with λι = 1 for all ι = 1, 2, 3, and B̂[0] = L{b̂MSF},
satisfy these POCS convergence conditions, in which case
the formal convergent POCS procedure (34) becomes the
developed above fixed-point DEDR-POCS algorithm given
by (32).
Now we are ready to proceed with the hardware codesign
implementation stage of our development.
5. Hardware/Software Codesign Methodology
The all-software execution of the prescribed RS image
formation and reconstruction operations in modern high-
speed personal computers (PC) or any existing digital signal
processors (DSP) may be intensively time consuming [15].
These high computational complexities of the general-form
DEDR-POCS algorithms make them definitely unacceptable
for real time PC-aided implementation.
When a coprocessor-based solution is employed in the
HW/SW codesign architecture, the computational time can
be drastically reduced [16]. As an introductive example,
consider computation of the matrix product AB, where A
and B define matrices of sizes k ×m and m× p, respectively.
Then to execute this product in a conventional sequential
way, k × m × p multiply accumulation (MAC) operations
are required. Therefore, the computational time required by
a sequential processor or a high-speed PC for the all-software
execution of the matrix product is of the order O(k × m ×
p). With the incorporation of a parallel and/or pipelined
coprocessor alongside an embedded processor the required
computational time is immediately reduced to O(k × m ×
p/n), where n defines the employed parallelism level.
In this section, we present a concise description of the
proposed HW/SW codesign approach particularly adapted
to the DEDR-POCS type algorithms, and demonstrate its
flexibility in performing an eﬃcient HW implementation of
the SW processing tasks with systolic arrays as coprocessors
units. In [10], we presented an initial version of the HW/SW-
architecture for implementing the digital processing of a
large-scale RS imagery in other operational context. The
architecture developed in [10] did not involve systolic arrays
and is considered here simply as a reference for the new
pursued HW/SW codesign paradigm presented in Figure 3,
where the corresponding blocks are to be designed to speed-
up the digital signal processing operations of the DEDR-
POCS-related algorithms developed at the previous SW stage
of the overall HW/SW to meet the real time imaging system




























Coprocessor 1 Coprocessor k
Figure 3: Illustration of the HW/SW codesign paradigm.
requirements. Our codesign methodology encompasses the
following general stages: (i) algorithmic implementation
(reference simulation in the MATLAB platform); (ii) compu-
tational tasks partitioning process (definition of the number
of coprocessors), and (iii) operational mapping process
employed to map the computation execution tasks onto the
HW blocks (reconfigurable arrays).
In the HW design, we use the precision of 32 bits
for performing all fixed-point operations, in particular, 9-
bit integer and 23-bits decimal for the implementation of
each coprocessor. Such the precision guarantees numerical
computational errors less than 105 referring to the MATLAB
Fixed Point Toolbox [17]. Using such the MATLAB fixed-
point toolbox we generated all the numerical test sequences
required to verify computationally the proposed HW/SW
codesign methodology (i.e., test sequences for performing
the SW simulation and for the HW verifications). The results
of such SW simulation and HW performance analysis will be
presented and discussed further on in Sections 6.3 and 6.4.
Finally, the host processor (the standard MicroBlaze embed-
ded processor [18] in this study) performs the following
functions: loading and storing of images, data transfer to the
HW coprocessors, and data formatting for performing the
correspondent mathematical operations.
5.1. Algorithmic Implementation. In this section, we develop
the procedures for computational implementation of the
DEDR-POCS-related RSF and RASF algorithms in the MAT-
LAB platform. This reference implementation scheme will be
next compared with the proposed HW/SW codesign archi-
tecture based on the use of the single Field Programmable
Gate Array chip.
To implement the iterative fixed-point DEDR-POCS-
related RSF and RASF algorithms (32), we first, specify the
corresponding computational procedures in the rectangular
scene frame r = (x, y) ∈ R over the azimuth (horizontal
axis, x) and range direction (vertical axis, y), respectively.
Such multi-stage procedures are formalized via the unified
algorithmic scheme presented in Table 1.
From the analysis of the algorithmic implementation
scheme of Table 1, we outline the following important
remarks regarding the possible HW/SW partitioning of the
computational tasks required for implementing both RSF
and RASF algorithms.
(i) First, the PSMs (25), Ψa and Ψr factorized over
the azimuth and range axes can be calculated con-
currently that we refer to as Ψa||Ψr , where symbol
|| specifies now the concurrent execution of the
corresponding computational operations.
(ii) Second, the zero step iteration (MSF image) B̂[0]
can be computed using the same factorized structure
analogues to Ψa||Ψr .
(iii) Third, the reconstructed image B̂[i+1], at the current
(i + 1)st iteration step is an iteratively updated
function of {B̂[i]} computed at the previous ith
iteration that also admits the factorized computing.
5.2. Partitioning Phase. One of the challenging problems of
the HW/SW codesign is to perform an eﬃcient HW/SW
partitioning of the computational tasks. The goal of the
partitioning stage is to find which computational tasks can
be implemented in an eﬃcient parallelized HW/SW archi-
tecture seeking for balanced area-time trade-oﬀs between
diﬀerent admissible design solutions [18–20]. In this study,
the iterative fixed-point POCS-DEDR regularized algorithm
has been partitioned at the algorithmic level to minimize
the overall signal processing (SP) time via transferring some
required reconstructive SP functions from the SW to the HW.
The solution to this problem requires, first, the definition
of a partitioning model that meets all the specification
requirements (functionality, goals and constraints).
The system partitioning is clearly influenced by the target
architecture onto which the HW and the SW will be mapped.
The target architecture proposed in this study consists
of one 32 bits RISC instruction set embedded processor
(MicroBlaze) running the software and three dedicated
coprocessors implemented by systolic processor arrays.
We begin with the specifications of the system-level
partitioning functions and detailing the selected design
quality attributes for the HW/SW codesign aimed at the
EURASIP Journal on Advances in Signal Processing 9
Table 1: Computational scheme for implementing the POCS-regularized RSF and RASF algorithms.
(i) Data acquisition u( j); j = 1, . . . , J
(ii) Formation of the current RS correlation data matrix Y (4)
(iii) Specification of the observation noise correlation model {Rn; R−1n }
(iv) Specification of the POCS model parameters {κa; κr}
(v) Specification of the POCS operator components {P+; Pκa⊥κr ; Pa⊥r}
(vi) Specification of the azimuth-range SFOs {Sa; Sr}
(vii) Computations of the azimuth-range PSMs {Ψa;Ψr} (25)
(viii) Formation of the azimuth-range POCS operators see (31)
(ix) Formation of the MSF image B̂[0] = L{b̂[0]} (30)
(x) Iterative POCS-RSF image enhancement B̂RSF (using (32)) with the robust updating A
−1 = diag−1(b̂[0]) of the iterative
reconstruction operator (27)
(xi) Iterative POCS-RASF image reconstruction B̂RASF (using (32)) employing the adaptive updating A
−1 = diag−1(b̂[i]) of
the iterative reconstruction operator (27)
(xii) Termination of the iteration procedures
(xiii) Image enhancement/reconstruction performance analysis using the adopted quality metrics.
definition of the computational tasks that can be imple-
mented in a systolic computing form, namely: hardware area
(ha), hardware execution time (ht), software execution time
(St), and the selected system resolution (n); where max ha,
max ht and max St represent the upper bounds of these
constraints. In particular, for implementing the iterative
fixed-point POCS-regularized RSF and RASF algorithms, the
partitioning process must satisfy the following performance
requirements.
(i) In order to ensure a viable solution, the system must
always satisfy the constraints: 0 ≤ ha < max ha,
0 ≤ ht < max ht, for each ith hardware coprocessor
{Γi; i = 1, 2, 3} and 0 ≤ St < max St, for the
embedded processor Ω. These three hardware copro-
cessors {Γi} and the embedded processor compose
the target architecture D = {Ω,Γ1,Γ2,Γ3,Υ}, for
the pre-selected FPGA Υ with the corresponding pre-
determined architecture constraints C : {0 ≤ ha <
max ha; 0 ≤ ht < max ht; 0 ≤ St < max St} [18].
(ii) Each block implementation {G(Γi)}, G(Ω) must
satisfy the predefined execution time performance
requirements [18]: τ{G(Γi | Ci); i = 1, 2, 3} and
τ{G(Ω | C4)} conditioned by the specified above
architecture constraints {Ci : {0 ≤ hti < max hti; 0 ≤
hai < max hai}∀i = 1, 2, 3}, and C4 : 0 ≤ St <
max St, correspondingly.
Next, the system architecture D is to be specified to meet
the desirable time consuming performances via bounding



















where TMATLAB represents the execution time required for
implementing the corresponding DEDR-POCS-related RSF
and RASF algorithms in the standard MATLAB computa-
tional environment.
Following such partitioning paradigm, we decompose
now the fixed-point POCS-regularized RSF and RASF
algorithms developed at the SW-design into the standard
MicroBlaze embedded processor Ω with three coprocessors
{Γi; i = 1, 2, 3} as illustrated in Figure 4. The first copro-
cessor Γ1 (referred to as MSF coprocessor) implements the
zero-step iteration to form the MSF image B̂[0] = L{b̂[0]}
specified by (30). The second coprocessor Γ2 (referred to
as PSM coprocessor) implements the computations of the
PSM Ψa||Ψr given by (25) concurrently over the azimuth
and the range directions. The third coprocessor Γ3 (referred
to as Iterative POCS coprocessor) performs the required
robust updating A−1 = diag−1(b̂[0]) for implementing the
RSF algorithm and the adaptive updating A−1 = diag−1(b̂[n])
for implementing the POCS-regularized RASF image recon-
struction algorithm, respectively. All three coprocessors
{Γi; i = 1, 2, 3} are next implemented as systolic processor
arrays while the embedded processor Ω executes all the
required operational and control functions: loading and
storing of the images, data transfer to the HW coprocessors,
and data formatting for execution of all required numerical
operations.
5.3. Mapping Phase. In this section, we proceed with the
development of the procedure for mapping the corre-
sponding algorithms onto array processors. A systolic array
consists of a number of processor elements (PEs) with the
corresponding interconnection links among the PEs, and the
mapping technique transforms a space representation into
a space-time representation [21]. Systolic arrays are being
used for matrix operations and required specific processing
algorithms, such as, transform techniques, matrix multipli-
cation, convolution, and so forth, [21, 22]. The methodology






















Figure 4: Illustration of the partitioning phase of the HW/SW codesign.
of mapping the algorithms onto array structures is depicted
in Figure 5.
First, to achieve the desired maximal possible parallelism
in an algorithm, we perform the analysis of the data
dependencies in the corresponding computations. Then, the
algorithm is transformed into a single assignment algorithm
without global communication. A dependence graph (DG) is
used to analyze these data dependencies of the corresponding
algorithms [21]. Following [21], DG is defined as G = [P, E],
where P represents a set of nodes and E is a set of arcs (or
edges), that is, each edge e ∈ E connects the corresponding
pair of nodes p1, p2 ∈ P and the connection is formalized by
e = p1 e−→ p2.
Second, we employ the systolic design paradigm to
map a high dimensional (N-dimensional) DG to a lower
dimensional Signal to Flow Graph (SFG) [21]. Recall
that the systolic array is a space-time representation, in
which the function description defines the behavior within
a node, whereas the structural description specifies the
interconnections (edges and delays) between the nodes [21,
22]. In order to derive a regular systolic array architecture
with minimum number of nodes, we employ the linear
projection approach for processor assignment following the
methodology developed in [21, 22], that is, the nodes of
the DG in a certain straight line are projected onto the
corresponding PEs in the processor array represented by the
corresponding projection vector d. Thus, we seek for a linear
order reduction transformation [22]:
Φ : GN −→ ĜN−1 (36)
that maps the N-dimensional DG (GN ) onto the (N – 1)-
dimensional SFG (ĜN−1).
Such the desired linear transformation matrix Φ admits








Here, Π defines a (1× p) vector composed of the first row of
Φ that determines the time scheduling. This vector indicates
the normal direction of the equi-temporal hyper-planes in
the DG, “equi-temporal” being understood in the sense that
all the nodes on the same hyper-plane must be processed at
the same time [22]. The submatrix Σ of (p−1)×p dimension
(the rest rows of Φ), determine the space processor. With this
mapping, we are now ready to proceed with the construction
of the required regular (N − 1)-dimensional systolic arrays.
5.4. HW Implementation. Once the HW/SW codesign has
been defined, the three coprocessors employed in the archi-
tecture exemplified in Figure 4 can be implemented using the
HW systolic arrays. In this study, we are oriented at the use
of the Xilinx MicroBlaze soft processor that employs the On
Chip Peripheral Bus (OPB) for transferring the data from/to
the memory to/from the coprocessor [23]. Such the OPB is
a fully synchronous bus that connects other separate 32 bit
data buses. This system architecture (based on the FPGA
XC4VSX35-10ﬀ668 with the embedded processor and the
OPB buses) restricts the corresponding processing frequency
to 100 MHz. The typical rate of the OPB bus is 133 MByte/s,
providing that each data transfer of 32-bits is accomplished
at 30.05 ns [23]. Next, to avoid multiple data transfer from
the embedded processor data memory to the coprocessors, a
register file is to be implemented inside each coprocessor.
The first systolic array (referred to as the MSF coproces-
sor) implements the zero-step iteration of the unified fixed-
point DEDR-POCS procedure (32) to form the MSF image
B̂[0] = L{{S+YS}diag} as specified by (30). The function
of this systolic array is to perform the triple matrix mul-
tiplication, where matrix S has the band-Toeplitz structure
[5, 6] with the width of the non-zero strip over the azimuth
frame equal to 2κa. Following the methodology addressed
in the previous section, the triple matrix multiplication
corresponding to the MSF function can be implemented
using a cascade systolic array. First, the multiplication of a
band-Toeplitz matrix and a rectangular matrix is performed
EURASIP Journal on Advances in Signal Processing 11
and then, the result is multiplied with another band-Toeplitz
matrix. Each slide of the DG in the multiplication of the
band-Toeplitz matrix and the rectangular matrix is employed
using the following specifications in the transformations
defined by (37): Π = [1 1]T for the vector schedule, d =
[1 0]T for the projection vector and Σ = [0 1]T for the
space processor that determine the resulting transformation
matrix Φ (37). In Figure 6(a), we illustrate the triple matrix
multiplication mapped into a cascade systolic array with
the relevant MSF systolic array architecture exemplified in
Figure 6(b). The corresponding computations require only
O(2κa × Kx × 2κa) fixed-point operations, with 2κa  Kx,
where, as previously, 2κa defines the width of the non-zero
strip in the factorized band-Toeplitz PSM (25) and Kx is the
original image dimension over the azimuth frame.
The MSF coprocessor systolic architecture of Figure 6(b)
consists of identical linearly-connected processing elements
(PEs). In our case, the internal structure of each PE
contains a multiplier and an adder. Each PE receives 32-
bits operands and generates 64-bits product. Then, the
product is truncated to 32-bits with a fixed-point adopted
representation of 9 integers and 24 decimals. Next, since
the band-Toeplitz type matrix Sa is preloaded, the incoming
data Y are transmitted in parallel to the corresponding PEs.
After 2κa cycles of clock, the data outputs are produced
and transferred to the registers (gray blocks in Figure 6(a)).
Once the first of the triple matricial product is completed,
the data transfer to the second array begins. The control
unit block guarantees the correct synchronization between
the arrival of the input data and the computations for each
PE. The result buﬀer of Figure 6(b) consists of a shift buﬀer
used to store the 2ka × Kx × 2ka elements generated in
parallel by the boundary PEs. Finally, the bus interface unit
realizes the communication between the systolic array and
the embedded processor.
The second systolic array (referred to as the PSM copro-
cessor) implements the computation of the Point Spread
Matrix (PSM) function Ψ = S+S concurrently over the
azimuth and range axes, that is Ψa||Ψr , where, as previously,
symbol || specifies the concurrent execution of the corre-
sponding computational operations. In the PSM function,
both matrices Ψa and Ψr are band-Toeplitz type matrices
(dim{Ψa} = Kx × Kx; dim{Ψr} = Ky × Ky) with the widths
of the non-zero strips equal to 2κa and 2κr , correspondingly,
where due to the PSM sparseness, 2κa  Kx and 2κr 
Ky . Thus, to perform the required reconstruction over the
azimuth direction, it is possible to achieve full parallelism
with only an 2κa × 2κa rectangular array (as opposed to
an original full-dimensional Kx × Kx array in the general
case [10, 21]). Due to the range-azimuth factorization, the
same parallelism is achievable in the range direction as
well with the corresponding 2κr × 2κr rectangular systolic
array. The matrix multiplication of two band-Toeplitz type
matrices employs now the following specifications in the
transformations defined by (37): Π = [1 1 1]T for the
vector schedule, d = [1 1 1]T for the projection vector,
Σ =
[ 1 0 1
0 1 1
]
for the space processor that determine the
resulting transformation matrix Φ (37). The topological
distribution of the processing elements (PEs) in such systolic
structure is shown in Figure 7(a). The corresponding PSM
systolic coprocessor architecture is presented in Figure 7(b)
with three independent directions of data flow.
The third coprocessor (referred to as the Iterative POCS
coprocessor) performs the adaptive updating of the iterative
reconstruction operator T[n] in the corresponding fixed-
point DEDR-POCS procedure (32). The key operations of
this coprocessor are to perform the standard 1-D convolution
and the vector-matrix multiplication. The systolic array for
performing the 1-D convolution employs now the following
specifications in the transformations defined by (37): Π =
[1 2]T for the vector schedule, d = [1 0]T for the
projection vector, and Σ = [0 1]T for the space processor
that determine the resulting transformation matrix Φ (37).
Figure 8(a) illustrates the 1-D convolution systolic array and
Figure 8(b) presents the relevant systolic architecture.
In summary, the developed systolic architectures per-
form the parallel and pipelined schemes which exploit the
proposed above mapping methodology. These architectures
provide the necessary HW-level implementation of the SW-
optimized complex multi-purpose RS imaging algorithms.
6. Simulations and Performance Analysis
6.1. Simulation Experiment Specifications. In the verifica-
tion simulation experiments, we considered a conventional
single-look SAR with the fractionally synthesized aperture as
an RS imaging system [1, 2]. Recall, that signal formation
operator (SFO) of such a SAR is factored along two axes in
the image plane [3]: the azimuth or cross-range coordinate
(horizontal axis, x) and the slant range (vertical axis, y),
respectively. We considered the conventional triangular SAR
range ambiguity function (AF) [3] Ψr(y) and Gaussian
approximation [5, 6], Ψa(x) = exp(−(x)2/a2), of the SAR
azimuth AF with the adjustable fractional parameter, a.
Note that in the imaging radar applications [3, 4], an AF
is referred to as the continuous-form approximation of the
PSM Ψ defined by (25) and serves as an equivalent to the
point spread function in the conventional image processing
terminology [9]. The image degradation and noising eﬀects
were incorporated to simulate the process of formation of
the degraded speckle-corrupted MSF images. First, following
[1, 3] the degradation in the spatial resolution due to
the fractional aperture synthesis mode were simulated via
blurring the original image with the range AF Ψr(Δy) along
the y axis and with the azimuth AF Ψa(Δx) along the x axis,
respectively. Next, the degradations at the image-formation
level due to the propagation and calibration uncertainties
were simulated using the statistical model of a SAR image
defocusing [2, 3]. For a considered single-look SAR, the con-
ventional MSF image formation algorithm (30) implies, first,
application of the regular adjoint SFO S+ to the zero-mean
Gaussian data realization u, and second, performing the
element-by-element (i.e., pixel-by-pixel) squared detection
of S+u to compose the corresponding SSP pixel estimates
{b̂MSF k = |{S+u}kk|2; k = 1, . . . ,K}. Consequently, the
MSF pixel estimates {B̂[i](kx ,ky) = L{b̂MSF k}} are chi-squared


























Figure 5: Mapping Design Methodology.
Matrix {Y}
































































Figure 6: MSF implementation: (a) cascade systolic array for performing the MSF function; (b) MSF systolic architecture.

































Figure 7: PSM implementation: (a) systolic array for performing the PSM function; (b) PSM coprocessor systolic architecture.
































Figure 8: Convolution function: (a) systolic array for performing the 1D convolution; (b) systolic array architecture.
distributed χ22 with two degrees of freedom, and such a
distribution is a negative exponential Rayleigh distribution
[2, 9]. Thus, to comply with the technically-motivated MSF
image formation scheme, the composite multiplicative noise
was simulated as a realization of the χ22-distributed random
variables with the pixel mean value assigned to the actual
degraded scene image pixel that directly obeys the statistical
speckle model [2, 5, 6]. Such signal-dependent multiplicative
image noise dominates the additive noise component in
the data in the sense that NΣ  N0, hence the estimate
N̂Σ performed empirically via the application of the local
statistics method [2] was used to adjust the regularization
degrees of freedom (regularization factors) in all simulated
DEDR-related SSP reconstruction procedures.
We have run the simulation experiments for both certain
and uncertain operational scenarios. In the both scenarios,
we considered the MSF, RSF and RASF algorithms from the
DEDR-POCS family (22). Also, to compare the developed
algorithms with the conventional SAR image enhancement
techniques [1–3], the celebrated Lee adaptive de-speckling
14 EURASIP Journal on Advances in Signal Processing

















Figure 10: Original test scene represented in MATLAB pseudocolor scale.
filter based on the local statistics method [2] was simulated.
The family of four simulated techniques were renumbered as
p = 1, . . . , 4. The first one (p = 1) relates to the conventional
MSF estimator (30) that employs the adjoint SO F(1) = S+.
This degraded MSF image B̂(1) = L{b̂MSF} was then post-
processed applying the Lee adaptive de-speckling filter [2]
that we refer to as the adaptively de-speckled MSF image
B̂(2), that is, p = 2. Next, the non-adaptive RSF algorithm
with the solution operator F(3) = FRSF defined by (19) was
applied to enhance the original MSF image B̂(1) employing
the iterative DEDR-RSF version of the unified fixed-point
iterative procedure (32); the resulting DEDR-RSF enhanced
image was specified as B̂(3) = L{b̂RSF} and numbered as
p = 3. Last, the fourth simulated technique corresponds
to the adaptive DEDR-RASF method (32) with the optimal
solution operator F(4) = FRASF given by (21); the resulting
adaptively enhanced DEDR-RASF image was specified as
B̂(4) = L{b̂RASF} and numbered correspondingly as p = 4.
In the second (uncertain) simulated scenario, the system AF
was additionally distorted over the azimuth frame within
the realistic interval of Δκa = 0.07κa that corresponds
to the partially uncompensated carrier trajectory deviations
interval [2, 10]. For both scenarios, the simulations were
run for diﬀerent composite signal-to-noise ratios (SNR) μ
defined as the ratio of the average signal component in
the rough image formed using the MSF algorithm (30)
to the relevant noise component in the same image,μ =
(b0/NΣ)(tr{Ψa} tr{Ψr})−1 × tr{(Ψa)2} tr{(Ψr)2}, where b0
represents the average gray level of the original scene image.
6.2. Performance Metrics. The first adopted quality metric
was borrowed from the classical image reconstruction appli-
cations [9] defined as an improvement in the output signal-
to-noise ratio (IOSNR):












)2 ; p = 2, 3, 4, (38)
EURASIP Journal on Advances in Signal Processing 15
Table 2: IOSNR values providedwith three simulated DEDR-related methods, p = 2, 3, 4 (2) adaptive despeckling filter; (3) DEDR-RSF; (4)
DEDR-RASF; results are reported for the certain and uncertain simulated scenarios.
SNR (dB)
IOSNR(p); p = 2, 3, 4
FIRST (CERTAIN) SCENARIO: κr = 6; κa = 15 SECOND (UNCERTAIN) SCENARIO: κr = 6; κa = 18
IOSNR(2) IOSNR(3) IOSNR(4) IOSNR(2) IOSNR(3) IOSNR(4)
5 1.83 3.73 9.12 1.31 3.45 6.36
10 2.47 4.80 10.11 1.96 4.14 7.93
15 3.25 7.87 11.12 3.21 6.67 8.52
20 4.64 9.05 13.42 4.02 8.32 10.28
where bk represents the value of the kth element (pixel)
of the original image B, b̂(MSF)k represents the value of the
kth element (pixel) of the degraded image formed applying
the MSF technique (37), and b̂
(p)
k represents a value of the
kth pixel of the image reconstructed with three simulated
enhancement methods, p = 2, 3, 4 where p = 2 corresponds
to the adaptive de-speckling algorithm based on the local
statistics method [2], p = 3 corresponds to the POCS-
RSF algorithm and p = 4 corresponds to the POCS-RASF
algorithm, that is, the best one from the developed DEDR-
POCS family, respectively. The second adopted metric, the
so-called mean absolute error (MAE), was employed as a
metric suitable for quantification of edges and fine detail
preservation in the reconstructed image defined as [15]















⎠; p = 2, 3, 4. (39)
According to these quality metrics, the higher is the IOSNR,
and the lower is the MAE, the better is the improvement
of the image enhanced/reconstructed with the particular
employed algorithm.
6.3. Simulations. In this study, the simulations were perfor-
med with a large scale (1K-by-1K) pixel-format image
borrowed from the real-world high-resolution terrain
SAR imagery (south-west Guadalajara region, Mexico
[24]). The quantitative measures of the image enhance-
ment/reconstruction performance gains achieved with the
particular employed POCS-RSF and POCS-RASF techniques
for diﬀerent SNRs evaluated with two diﬀerent quality
metrics (38), (39) are reported in Table 2. Figure 9 shows the
original scene image (not observable with the simulated SAR
systems). Figure 10 illustrates the same original test scene
represented in MATLAB pseudocolor scale.
The images of Figures 11(a) through 11(h) present
the results of image formation and enhancement applying
diﬀerent DEDR-related estimators without model uncer-
tainties as specified in the figure captions. In the second
simulated scenario, the fractional SAR system suﬀered from
more severe degradations because of the additional system
defocusing and multiplicative speckle noising due to the
operational scenario uncertainties. Figures 12(a) thru 12(h)
present the results of image formation and enhancement
applying diﬀerent DEDR-related estimators in the simulated
uncertain operational scenario as specified in the figure cap-
tions. From the analysis of the reported simulation results, it
is evident that the RASF method overperformed the robust
nonadaptive RSF in both simulated scenarios. This demon-
strates that employing the adaptive RASF technique from
the DEDR-POCS family one could substantially improve the
quality of the RS images (reconstructed from both certain
and uncertain RS measurement data) approaching in the
same time (near) real-time computational performances.
Next in Figure 13, we present the convergence curves
related to the iterative-form implementation of the POCS-
RSF/RASF techniques for the test case 15 dB SNR (i.e., μ =
15 dB). From the analysis of these curves one can deduce
that after 40 iterations both POCS-RSF and POCS-RASF
algorithms begin to suﬀer from some numerical instabilities.
This type of numerical instability is a subtle issue in
constructing the regularized iterative techniques for diﬀerent
ill-conditioned problems, for example [1, 3, 9], and so forth.
Moreover, the relationship between the resulting IOSNR
quality metric and the visual reconstructed image quality is
not fully understood, although, of course, one would expect
a high degree of correlation between the two [9]. These
observations are in concord with the similar observations
from other studies of the inverse imaging problems in other
ill-posed contexts, for example, [1, 2, 9, 10]. In our case, due
to the POCS regularization, the appearance of the DEDR
reconstructed images demonstrated substantial improve-
ment up to 15 iterations from the MSF starting point. Next,
the appearance of the reconstructed images changed very
little from that of the 15 to 25 iterations. The changes became
perceptually undistinguishable after 25· · · 30 iterations. This
behavior can be used as a motivation for the empirical
stopping rule at 25· · · 30 iterations of the POCS-regularized
iterative POCS-RSF and POCS-RASF algorithms.
6.4. HW/SW Codesign Performance Analysis. In this section,
we complete our study with the comparative analysis of
the computational complexities of the simulated iterative
DEDR-POCS algorithms implemented using the systolic
coprocessors constructed following the addressed HW/SW
codesign approach. The synthesis metrics related to the
implementation of the systolic arrays architectures as copro-
cessors are summarized in Table 3. First, we exemplify the
MSF, PSM and iterative POCS coprocessor architectures for
the following simplified specifications: data matrices of size
12 × 12 and two Band-Toeplitz PSF matrices of the same
16 EURASIP Journal on Advances in Signal Processing
Table 3: Synthesis metrics. Specifications for data matrices of size 12×12 and two Band-Toeplitz PSF matrices of the same 12×12 pixel size
with equal bandwidths of 2κa = 3 and 2κr = 3.
Synthesis metrics Systolic array coprocessors
MSF PSM Iterative POCS
Number of slices 905 (5.89%)c 242 (1.57%) 216 (1.41%)
Number of aDSP’48 192 (100%) 9 (4.68%) 12 (6.25%)
Number of bLUTs 932 (3.03%) — —
Number of flip-flops 1845 (6.01%) 480 (1.56%) 432 (1.40%)
Maximum frequency 115.30 MHz 152.20 MHz 148.69 MHz
Maximum pin delay 8.67 ns 6.57 ns 6.72 ns
a
DSP’48 are dedicated DSP blocks in high-end FPGAs–such as the Xilinx XtremeDSP slice in Virtex-4 [18].
bLUTs is an acronym to Look Up Tables structures.
cData in parenthesis (.) report the percentages of the occupied HW resources related to the particular synthesis metrics in terms of the total available on the relevant
device.
Table 4: Processing times required for implementing the conventional DEDR and the developed POCS-regularized (SW/HW codesign-
based) unified DEDR-POCS techniques (RSF and RASF).
Implementation method
Processing time [seconds]
RSF (per iteration) RASF (per iteration)
Hypothetical Full-Format Implementation(Evaluated PC-Oriented Implementation) 5171.6 5655
Factorized Fixed-Point POCS-Regularized Implementation (PC-Oriented) 19.70 20.05
Previous HW/SW codesign-based implementation [10] (without systolic arrays) 7.82 7.985
Proposed HW/SW codesign-based implementation (with systolic arrays) 2.51 2.56
Note–Processing times may vary depending on the processor type, CPU memory and the software used.
12 × 12 pixel size with equal bandwidths of 2κa = 3 and
2κr = 3 pixels. The relevant SP performance analysis results
are resumed in Table 3. Next, in Figures 14(a) through 14(c),
we summarize the relevant HW synthesis performances for
the realistic case of large-scale processed RS scenes (e.g., to
1K×1K pixel size) and report the overall resource utilization
performances attained with the proposed HW coprocessors
architectures for diﬀerent number of processing elements
(PEs).
Next, the reported metrics of Table 3 specify the area
and time behaviors of the corresponding hardware systolic
arrays, that is the corresponding MSF, the PSM, and the
iterative POCS architectures specified above in Section 5.
From the analysis of the data reported in Table 3 and Figures
14(a) through 14s(c), one can deduce the following: With
the proposed HW/SW codesign architecture (in which the
embedded processors iterate properly the corresponding SP
procedures) the DEDR-POCS-related algorithms can be eﬃ-
ciently implemented in an iterative fixed-point fashion also
for the realistic large-scale scenes (e.g., 1K × 1K pixel size).
Pursuing the proposed systolic computing architecture con-
cept, the increased scene dimensionality requires the proper
segmentation of the scene frame with the parallelized com-
puting performed over the partitioned segments followed
by the relevant integration of the overall partial processed
data. Such partitioned systolic HW/SW codesign computing-
oriented processing can be performed directly following the
architecture design concept proposed and specified in the
previous Section 5.4. Additionally, the scalability in terms
of Flip-Flops, Slices and LUTs (i.e., the HW resources of
the FPGA) for the proposed MSF, PSM and iterative POCS
coprocessors are reported in Figures 14(a) through 14(c).
In fact, the corresponding DEDR-related SP algorithms can
be eﬃciently implemented in a Field Programmable Systems
on Chip (FPSoC) mode in spite of employing conventional
systems based on multi-FPGAs or PC-Clusters [12–14, 16].
The latter is practically inspired and desirable for a wide
range of RS and general SP applications due to the large
range density of the existing FPGAs that incorporate huge
resources of logical gates, block RAM memory modules and
soft or hard-embedded processors integrated on the same
chip with the relevant custom co-processing HW blocks,
and so forth. For example, an alternative approach for high-
speed computational implementation of the reconstructive
RS image processing based on the use of clusters of PCs
was presented in [12–14]. In [12], the cluster NSPO Parallel
TestBed for performing parallel radiometric and geometrical
corrections of the large-scale 3600×2944-pixel RS images
was implemented. The reconstructive image processing was
conducted using a PC-Cluster composed by three PCs each
one with a Pentium-III 550 MHz with 128 MB of RAM con-
nected with 100 Mbps Fast-Ethernet LAN. The processing
time achieved with such three-PCs cluster was only 33.3
seconds (near-real time for conventional RS users), while
the corresponding processing performed with one single
processor required 84.65 seconds. In [13, 14], another kind
of parallel architecture was implemented for morphological
classification of hyperspectral RS imagery at the NASA’s









































































Figure 11: Simulation results for certain observation scenario: (SNR μ = 10 dB): (a) degraded scene image formed applying the MSF
method; (b) the same degraded scene represented in the MATLAB pseudo-color scale; (c) image reconstructed applying the Lee adaptive
de-speckling algorithm; (d) the same adaptively de-speckled scene represented in the MATLAB pseudo-color scale; (e) image reconstructed
applying the POCS-RSF algorithm; (f) image reconstructed applying the POCS-RSF algorithm represented in the MATLAB pseudo-color
scale; (g) image reconstructed applying the POCS-RASF algorithm; (h) image reconstructed applying the POCS-RASF algorithm represented
in the MATLAB pseudo-color scale.









































































Figure 12: Simulation results for uncertain observation scenario: (SNR μ = 10 dB): (a) degraded scene image formed applying the MSF
method; (b) the same degraded scene represented in the MATLAB pseudo-color scale; (c) image reconstructed applying the Lee adaptive
de-speckling algorithm; (d) the same adaptively de-speckled scene represented in the MATLAB pseudo-color scale; (e) image reconstructed
applying the POCS-RSF algorithm; (f) image reconstructed applying the POCS-RSF algorithm represented in the MATLAB pseudo-color
scale; (g) image reconstructed applying the POCS-RASF algorithm; (h) image reconstructed applying the POCS-RASF algorithm represented
in the MATLAB pseudo-color scale.




































(b) Uncertain observation scenario





































































(c) Iterative POCS coprocessor
Figure 14: Resource utilization varying the number of PEs.
20 EURASIP Journal on Advances in Signal Processing
Goddard Space Flight Center. The parallel classifier of [14]
uses 256-processor Beowulf cluster (Thunderhead cluster)
with hybrid neural parallelism that enables such a system
to perform an accurate classification of the hyperspectral RS
scenes in only 17 seconds.
As a result, advances on high performance computing as
well as on specialized high performance hardware modules
are necessarily required to achieve the near-real processing
time performances for complex RS algorithms.
Last, we compared the required processing time of the
general-form RSF/RASF DEDR-related procedures (22) and
the iterative fixed-point DEDR-POCS-regularized algorithm
(32), both implemented using the conventional MATLAB
software in a personal computer (PC) running at 3 GHz
with a AMD Athlon (tm) 64 dual-core processor and 2 GB
of RAM memory. Also, the same DEDR-related algorithms
were implemented using the proposed HW/SW codesign
architecture (soft- and hardware) without systolic and with
systolic arrays employing the Xilinx FPGA XC4VSX35-
10ﬀ668. The corresponding comparative results are reported
in Table 4. Analyzing these reported results, one may
deduce the following. The iterative fixed-point DEDR-
POCS-regularized algorithm (32) manifests the (near) real
time high-resolution enhancement/reconstruction of the RS
imagery. The implementation of the proposed HW/SW
codesign architecture helps to reduce drastically the overall
processing time. Particularly, the proposed implementation
of the iterative POCS-regularized RASF algorithm with
systolic arrays takes only 2.56 seconds for each iteration of the
image reconstruction. In total, it takes 64 sec for 25 iterations.
This new computation time is approximately 3 times less
than the previous implementation without systolic arrays
[10], 8 times less than the corresponding processing time
achievable with the MATLAB POCS-based implementation,
and it is ∼103 times less than the hypothetical processing
time required for implementing the full-format conventional
general-form DEDR-RASF algorithm (22) without POCS
regularization and without systolic computing.
7. Concluding Remarks
The principal result of the undertaken study relates to the
digital signal processing-oriented solution of the RS image
enhancement/reconstruction problems in a (near) real time
computing mode (the “near real time” being understood in
context of conventional RS users) via exploiting the aggre-
gated hardware/software (HW/SW) codesign paradigm that
results in an eﬃcient hardware implementation architecture
based on the use of systolic array processors. We have
approached the goal of the (near) real time computational
implementation of the enhancement/reconstruction of the
RS imagery from two directions. First, we have analytically
established that to alleviate the problem ill-posedness and
reduce the overall computational load of the large-scale
image enhancement/reconstruction tasks at the algorithmic
processing level, some special form of descriptive experiment
design projection-type numerical regularization must be
employed. This stage was developed and addressed here
as the unified DEDR method, and the eﬃcient fixed-
point numerical iterative technique that incorporates the
proper construction of the relevant orthogonally factor-
ized regularizing projector onto convex sets (POCS) in
the solution domain was designed and specified for the
particular employed RS sensor system, namely, the side-
looking imaging synthetic aperture radar (SAR) operating
in both certain and uncertain scenarios. We have also
examined how such SAR-adapted POCS-regularized fixed-
point iterative technique can be executed concurrently over
the orthogonal range-azimuth coordinates with optimal use
of the sparseness properties of the overall SAR system
point spread function characteristics. The algorithmic-level
advantages of such unified DEDR-POCS-regularized RS
image enhancement/reconstruction techniques relate to the
theoretically guaranteed convergence of the corresponding
fixed-point iterative process with the proper factorization of
the numerical reconstructive procedures over the orthogonal
range-azimuth directions in the representation image frame.
Second, we have examined that pursuing the proposed
HW/SW codesign paradigm and employing the systolic
arrays as co-processing units, the (near) real time image
processing requirements can be achieved due to performing
the corresponding computations in an eﬃcient systolic archi-
tecture mode. The unified algorithmic (software-level, SW)
and systematic (hardware-level, HW) codesign approach was
verified via computer simulation experiments indicative of
its eﬃciency for performing the RS image enhancement
and reconstruction tasks in (near) real computational time.
The tested DEDR-POCS-related techniques implemented
numerically using the proposed HW/SW codesigned compu-
tational architecture overperform the previously developed
methods both in the attained reconstruction quality and the
achievable computational complexity, that is manifest the
substantially reduced overall computational time (e.g., up to
three orders with respect to the schemes that do not aggregate
the POCS regularization with the systolic computing). We
do believe that pursuing the DEDR-POCS-related HW/SW
codesign paradigm with systolic array hardware accelerators
one could approach definitely the real time computational
requirements while performing the reconstructive process-
ing of the large-scale RS imagery attaining the enhance-
ment/reconstruction performance gains close to the limiting
bounds.
References
[1] D. R. Wehner, High-Resolution Radar, Artech House, Boston,
Mass, USA, 2nd edition, 1994.
[2] J. S. Lee, “Speckle suppression and analysis for synthetic
aperture radar images,” Optical Engineering, vol. 25, no. 5, pp.
636–643, 1986.
[3] F. M. Henderson and A. V. Lewis, “Principles and applications
of imaging radar,” in Manual of Remote Sensing, John Wiley &
Sons, New York, NY, USA, 3rd edition, 1998.
[4] Y. V. Shkvarko, “Estimation of wavefield power distribution
in the remotely sensed environment: Bayesian maximum
entropy approach,” IEEE Transactions on Signal Processing, vol.
50, no. 9, pp. 2333–2346, 2002.
[5] Y. V. Shkvarko, “Unifying regularization and Bayesian esti-
mation methods for enhanced imaging with remotely sensed
EURASIP Journal on Advances in Signal Processing 21
data—part I: theory,” IEEE Transactions on Geoscience and
Remote Sensing, vol. 42, no. 5, pp. 923–931, 2004.
[6] Y. V. Shkvarko, “Unifying regularization and Bayesian esti-
mation methods for enhanced imaging with remotely sensed
data—part II: implementation and performance issues,” IEEE
Transactions on Geoscience and Remote Sensing, vol. 42, no. 5,
pp. 932–940, 2004.
[7] Y. V. Shkvarko, “From matched spatial filtering towards
the fused statistical descriptive regularization method for
enhanced radar imaging,” EURASIP Journal on Applied Signal
Processing, vol. 2006, Article ID 39657, 9 pages, 2006.
[8] Y. V. Shkvarko, H. Perez-Meana, and A. Castillo-Atoche,
“Enhanced radar imaging in uncertain environment: a
descriptive experiment design regularization paradigm,” Inter-
national Journal of Navigation and Observation, vol. 2008,
Article ID 810816, 11 pages, 2008.
[9] H. H. Barrett and K. J. Myers, Foundations of Image Science,
John Willey & Sons, New York, NY, USA, 2004.
[10] A. Castillo, Y. V. Shkvarko, D. Torres, and H. M. Perez,
“Convex regularization-based hardware/software co-design
for real-time enhancement of remote sensing imagery,” Inter-
national Journal of Real Time Image Processing, vol. 4, no. 3, pp.
261–272, 2008.
[11] V. I. Ponomaryov, “Real-time 2D-3D filtering using order
statistics based algorithms,” Journal of Real-Time Image Pro-
cessing, vol. 1, no. 3, pp. 173–194, 2007.
[12] C. T. Yang, C. L. Chang, C. C. Hung, and F. Wu, “Using
a Beowulf cluster for a remote sensing application,” in
Proceedings of the 22nd Asian Conference on Remote Sensing,
vol. 1, Singapore, November 2001.
[13] “Thunderhead System,” http://newton.gsfc.nasa.gov/thunder-
head/.
[14] A. Plaza and J. Plaza, “Parallel morphological classification of
hyperspectral imagery using extended opening and closing by
reconstruction operations,” in Proceedings of the International
Geoscience and Remote Sensing Symposium (IGARSS ’08), pp.
I.58–I.61, Boston, Mass, USA, July 2008.
[15] U. Meyer-Baese, Digital Signal Processing with Field Pro-
grammable Gate Array, Springer, Berlin, Germany, 2001.
[16] J. Greco, G. Cieslewski, A. Jacobs, I. A. Troxel, and A. D.
George, “Hardware/software interface for high-performance
space computing with FPGA coprocessors,” in Proceedings of
the IEEE Aerospace Conference, pp. 10–25, Big Sky, Mont, USA,
March 2006.
[17] “Fixed-Point ToolboxTM User’s Guide, MATLAB,” http://www
.mathworks.com/.
[18] “EDK 9.1 MicroBlaze tutorial in Virtex-4,” Xilinx, http://www
.xilinx.com/.
[19] A. Marquardt, V. Betz, and J. Rose, “Speed and area tradeoﬀs in
cluster-based FPGA architectures,” IEEE Transactions on Very
Large Scale Integration (VLSI) Systems, vol. 8, no. 1, pp. 84–93,
2000.
[20] M. Lo´pez-Vallejo and J. C. Lo´pez, “On the hardware-software
partitioning problem: system modeling and partitioning tech-
niques,” ACM Transactions on Design Automation of Electronic
Systems, vol. 8, no. 3, pp. 269–297, 2003.
[21] S. C. Lo and S. N. Jean, “Mapping algorithms to VLSI array
processors,” in Proceedings of the International Conference on
Acoustics, Speech, and Signal Processing (ICASSP ’88), pp.
2033–2036, New York, NY, USA, 1988.
[22] S. Y. Kung, VLSI Array Processors, Prentice Hall, Upper Saddle
River, NJ, USA, 1988.
[23] “Xilinx Application Note XAPP967: creating an OPB IPIF-
based IP and using it in EDK,” 2007.
[24] “Space Imaging,” GeoEye, 2008, http://www.euspaceimaging
.com/.
