PARSAR: A SAR processor implemented in a cluster of workstations by Martínez, A et al.
19
97
ES
AS
P.
40
9.
.1
03
M
103 
PARSAR: A SAR PROCESSOR IMPLEMENTED IN A CLUSTER OF WORKSTATIONS 
A.Martinez, F.Fraile 
Remote Sensing Dep., INDRA Espacio 
Cl Mar Egeo s/n. 28850-S.Femando de Henares, SPAIN 
Tlf.+34 I 396 3911. Fax+34 I 396 3912 
e-mail: amar@mdr.indra-espacio.es 
J.J.Mallorqui, L.Nogueira, J.Gabalda, A.Broquetas, A.Gonzalez(*) 
Signal Theory & Communications Dep., U.Politecnica Cataltmya 
(*)Computer Architecture Dep., U.Politecnica Catalunya 
C/Sor Eulalia de Anzizu sin, Ed. D-3. 08071-Barcelona, SPAIN 
Tlf.+34 3 401 7229. Fax+34 3 401 7232 
e-mail:mallorqu@voltor. upc.es 
ABSTRACT 
A parallel SAR processor is presented in this paper. The 
target configuration is a cluster of UNIX workstations, 
available in most user sites. This fact allows to obtain an 
increased computing performance without the need of 
dedicated hardware investment. 
1. INTRODUCTION 
Synthetic Aperture Radar, SAR, is a remote sensing 
instrument capable of obtaining high resolution images of 
the Earth surface [ 1]. The goal of SAR processing is to 
transform the SAR raw data, or SAR signal, into an image. 
The generation of SAR images involves both a great amount 
of data and a complex focusing algorithm. These reasons 
make attractive the application ofHPCN technology. 
The objective of P ARSAR project is the porting of a 
sequential SAR Processor to a parallel architecture. The 
target configuration is a cluster of UNIX workstations, 
allowing to exploit the benefits of parallelisation without the 
need of hardware investment. 
This paper describes the parallelisation strategy of the 
processor, based on a multi-block approach using PVM as 
interface among processes. Some preliminary performance 
results are presented, along with the currently on-going 
activities. 
2. DESCRIPTION OF DATA AND ALGORITHM 
The SAR data used in the project correspond to the standard 
full frame scene (! 00 x 100 km2) of the European ERS 
satellite [2]. The raw data is a matrix of 26800 lines each 
one consisting of 5616 samples or pixels. The pixels are 
complex numbers, coded in 1 + 1 bytes, and the size of the 
raw data file is about 300 MB. The calculations are 
performed in floating point format, resulting in a processing 
matrix of 1.2 GB. The output SAR image has 25000 lines 
each one consisting in 4912 samples. The pixels are 
complex and coded in 2 + 2 bytes, resulting in about 500 
MB. 
Figure I. Flow Chart of the sequential CSA Processor 
The focusing of SAR raw data is essentially a 2-D 
correlation of the input signal with the SAR Impulse 
Proceedings of the DASIA 97 Conference on 'Data Systems in Aerospace', Sevilla, Spain, 26-29 May 1997 (SP-409, August 1997) 
© European Space Agency • Provided by the NASA Astrophysics Data System 
19
97
ES
AS
P.
40
9.
.1
03
M
104 
Response Function [3]. Classical methods of SAR 
processing implement the signal compression in the 
frequency domain. The SAR processor used in the project 
is called Chirp Scaling Algorithm, CSA [4]. The CSA 
involves only FFT and multiplications. The structure of the 
CSA is relatively simple, consisting on a sequence of 1-D 
FFT and matrix element by element products (see figure 1 ). 
The number of operations and loops is constant because the 
CSA does not use iterative procedures. 
3. DESIGN OF THE PARALLELISATION 
The parallelisation strategy that has been implemented is 
called Multi-Block Approach, MBA. It consists on dividing 
the input data into independent processing blocks; each 
block is fully processed in one host (child process), 
resulting in a small piece of the fmal image. MBA can be 
seen as a coarse grained parallelisation strategy. PVM, 
Parallel Virtual Machine, is used to control the whole 
process. PVM is a software system that permits a network 
of heterogeneous UNIX workstations to be used as a single 
large parallel computer [5]. 
The main advantage of MBA is the full independence 
between the different tasks split among the available hosts. 
Each child process works at his own pace and it does not 
need information from other children. Consequently, fast 
and slow workstations can coexist in the same cluster 
without penalizing the whole process. Another interesting 
point ofMBA is the minimization of both the number of I/0 
operations and the amount of data to be transferred across 
the cluster. 
The implementation of the MBA parallelisation is relatively 
easy, as the CSA processor has not to be split. Each host in 
the cluster has a CSA processor, very similar to the 
sequential one. Consequently, improvements in the 
sequential code are readily portable to the parallel software. 
On the other hand, the drawback of MBA is the correlation 
efficiency, that strongly depends on the size of the 
processing data blocks and hence, on the available RAM in 
each host. 
The flow char of the parallel SAR processor is shown in 
figure 2. There is a main process, the parent process, that is 
is charge oflaunching different tasks in the computers of the 
network and managing the execution of them. There are 
three types of tasks or slaves processes: 
-cutter. This task is responsible of reading the raw data file 
and generating the different data blocks that will be sent to 
the nodes. 
- child. This is a simplified version of the CSA processor, 
that reads a data block and produces a small piece of the 
fmal image (imagette). 
- builder. The builder reads the different pieces of the image 
and assembles the fmal SAR image file. 
Both the raw data blocks and the imagettes generated by the 
processors are stored in temporary files. At first sight, the 
use of temporary files can be seen as a drawback, due to the 
increased number of I/0 operations and disk usage. 
Nevertheless, tests conducted without temporary files 
showed the importance of disk access conflicts when 
different child processes read from and write to the same 
large files. Furthermore, disk operations by the child 
processes have to be done by direct access implying high 
inefficiency. 
Figure 2. Flow char of the parallel processor 
The parallel code starts by generating a set of data blocks 
that are assigned to the available hosts. The parent process 
continuously examine the status of the different hosts in the 
network. When one host is free, the parent assign a new 
block. The priorities in the parent process are: 
- 1. Cutter process. This is needed to ensure that there are 
data blocks available for the child processes. 
- 2. Child processes. Once a host fmishes the processing of 
a block, the waiting time is to be minimized. 
- 3. Builder process. When the imagettes corresponding to 
a image block are available, the corresponding part of the 
final file is assembled to free disk space. 
With this strategy, the disk bottleneck problems are 
alleviated by imposing that the different slave processes 
never access to the same file, and most disk operations can 
© European Space Agency • Provided by the NASA Astrophysics Data System 
19
97
ES
AS
P.
40
9.
.1
03
M
be earned out by using the more efficient sequential access. 
The number of hosts to be used in the "parallel machine" 
can be selected, as well as the tasks to be conducted bv each 
node. 
4. PRELIMINARY RESULTS 
The classtcal parameters to estimate the performance of the 
parallel code, speed up factor (ratio of the processing l!me 
in one node to the processing time in N nodes) and 
efficienc,· (speed up factor over the number of nodes), are 
not read!h· portable for a heterogeneous cluster (each node 
has its processing time). We have used an alternative 
definition for these parameters that is intuitive and makes 
some sense for a heterogeneous set of computers: 
- efficiencv: the ratio of the number of products generated 
by the sequential code in a given time to the number of 
products generated by the sequential code running in all the 
computers of the cluster during the same time. 
-speed up: the efficiency times the number of computers in 
the cluster. 
The results of the parallel processor running in different 
hardware configurations are listed in the next tables, along 
with the characteristics of the workstations in the clusters 
and the processing time of the sequential processor in each 
workstation. 
The first test used a block size of32 MB (note that 2 of the 
computers in the cluster have 64 MB RAM). The main, 
cutter and builder processes were run in HP-720, so that 
this host ts fully dedicated to data handling. The remaining 
two computers were in charge of SAR processing. The 
efficiencv of the parallel code is 0.64, with a speed up factor 
of1.93. 
MODEL Clock RAM Proc. Time 
Rate 
HP-735 991'v1Hz 96MB !50 min 
HP-7:0 50 1'v1Hz 64MB 320 min 
HP-715 50 1'v1Hz 64MB 320 min 
CLUSTER - - 120min 
Table 1. Processing time in UPC cluster. Processing block 
32 MB. Efficiency is 0.65. 
The results of the tests at INDRA are presented in table 2: 
now, a block size of 64MB was used, as the computers m 
the cluster have more available RAM. There are two test 
105 
cases: 
- Case I: Only the Sun computers, 3, are used. The 
processing time is 58 minutes, resulting in a efficiency of 
0.68, and a speed up factor of2.03. 
- Case 2: The four computers in the cluster are used. The 
processing time is 46 minutes, resulting in a efficiency of 
0.56, and a speed up factor of 2.24. 
In the two test cases, the slowest host in the cluster (Sun 
Spare I 0) was used to execute the mam, cutter and builder 
processes. 
MODEL Clock RAM Proc. Time 
Rate 
HP CI60-L 160 1'v1Hz 128MB 75 min 
Sun Ultra 1671'v1Hz 128MB 75 min 
1/170 
Sun Spare 751'v1Hz 128MB 135min 
2on1 
Sun Spare 40 1'v1Hz 128MB 210 min 
10/41 
Case 1. Sun - - 58min 
computers 
Case 2: All - - 46min 
hosts 
Table 2. Processing time in INDRA cluster. Processing 
block 64 l'vffi. 
The results of the tests shows that the parallel processor 
works relatively well in a cluster of three workstations. The 
efficiency figures obtained in the clusters at UPC and 
INDRA are equivalent. However, ,,·hen including an 
additional host to the cluster, the efficiency decreases. This 
is due to the fact that the computers performing SAR 
processing are faster than the cutter, so that they have to 
wait for data blocks to process. 
On-going work is currently being performed to upgrade and 
fine tune the performance of the parallel code. In particular, 
we may mention the following points: 
- Optimization of the I/0 operations, mainly the cutter 
process. This should allow the use of more workstations in 
the cluster without loss of efficiencv. 
-Allowing the processing data block size to be adjusted by 
each host in the cluster, so that computers with different 
RAM can be simultaneously used. 
© European Space Agency • Provided by the NASA Astrophysics Data System 
19
97
ES
AS
P.
40
9.
.1
03
M
106 
5. CONCLUSIONS 
The first activities in the parting of a sequential SAR 
Processor to a parallel architecture has been performed, 
including the selection of the parallelisation strategy and the 
implementation of the first parallel prototype. The parallel 
software is t1exible and portable, so that it can be installed 
in most user sites. 
The preliminary results obtained with the parallel code are 
encouragmg, and show a decrease in the processing time of 
the parallel code with respect to the sequential one. Good 
results were obtained with a cluster of three workstations. 
Additional work is on-going to enhance and fme-tune the 
code, so that the efficiency of the parallel code can be kept 
constant when adding more hosts to the cluster. 
6. REFERENCES 
[1] Elachi C "Spacebome Radar Remote Sensing", IEEE 
Press, 1988. 
[2] ESA "ESA ERS-1 Product Specifications", ESA SP-
1149, 1992. 
[3] Curlander JC. & McDonough R.N. "SAR: Systems and 
Signal Processing", John Wiley & Sons, 1991. 
[4] Raney K. et al. "Precision SAR Processing Using Chirp 
Scaling", IEEE Trans. Geosci. Remote Sensing, vol. 32 
pp.786-799, 1994. 
[5] Geist A et al. "PVM 3 User's Guide and Reference 
Manual", Report ORNLfTM-12187 (1994). 
[6] Martinez A. and Marchand J.L. "SAR image quality 
assessment", Rev. Teledetecci6n 2. (1993) pp.l2-18. 
[7] Martinez A et al. "Advanced Algorithm Techniques: 
Enhancement of CSA and Quicklook SAR Algorithms Final 
Report", ESA ESTEC Contract 3-8616/95/NL/FM ( 1997). 
[8] Geist A et al. "PVM 3 User's Guide and Reference 
Manual", report ORNLfTM-12187 (1994). 
[9] Sanchez H and Laur H. "ERS-1 SAR Product 
Validation", Proc. CEOS SAR Calibration Workshop, ESA 
WPP-048, (1993) pp.295-305. 
7. ACKNOWLEDGEMENTS 
P ARSAR is a project supported by PCI-II, ESPRIT IV. 
© European Space Agency • Provided by the NASA Astrophysics Data System 
19
97
ES
AS
P.
40
9.
.1
03
M
Autonomy 
Chairmen: F. Pittermann & R. Gerlich 
Domier & Bodan System and Software Eng., Germany 
© European Space Agency • Provided by the NASA Astrophysics Data System 
