PARSAR: Parallelisation of a chirp scaling algorithm SAR processor by Martínez, A et al.
PARSAR: Parallelisation of a Chirp Scaling 
Algorithm SAR Processor* 
Antonio Martinez, Francisco Fraile 
Remote Sensing Dep., INDRA Espacio 
C/Mar Egeo s]n. 28850-S.Femando eHenares, SPAIN 
e-mail: amar @ mdr.indra-espacio.es 
Jordi Mallorqui, Leonardo Nogueira, Jordi Gabald~i, Antoni Broquetas, 
Antonio Gonzalez (#) 
Signal Theory & Communications Dep., U.Politecnica Catalunya 
(#) Computer Architecture Dep., U.Politecnica Catalunya 
C/Sor Eulalia de Anzizu s/n, Ed. D-3. 08071-Barcelona, SPAIN 
Abstract. A parallel SAR processor is presented in this paper. The target 
configuration is a cluster of UNIX workstations, available in most user 
sites. This fact allows to obtain an increased computing performance 
without the need of dedicated hardware investment. 
1 Introduction 
Synthetic Aperture Radar, SAR, is a remote sensing instrument capable of obtaining high 
resolution images of the Earth surface [1]. The goal of SAR processing is to transform 
the SAR raw data, or SAR signal, into an image. The generation of SAR images 
involves both a great amount of data and a complex focusing algorithm. These reasons 
make attractive the application of HPCN technology. 
The objective of PARSAR project is the porting of a sequential SAR Processor to 
a parallel architecture. The target configuration is a cluster of UNIX workstations, 
allowing to exploit the b nefits of parallelisation without the need of hardware 
investment. This paper describes the parallelisation strategy of the processor, based on 
a multi-block approach using PVM as interface among processes. Some preliminary 
performance r sults are presented, along with the currently on-going activities. 
2 Description of Data and Algorithm 
The SAR data used in the project orrespond to the standard full frame scene (100 x 100 
km 2) of the ERS satellite [2]. The raw data is a matrix of 26800 lines each one 
consisting of 5616 samples or pixels. The pixels are complex numbers, coded in 1 + 1 
bytes; the size of the raw data file is about 300 MB. The calculations are done in 
floating point, resulting in a processing matrix of 1.2 GB. The output SAR image has 
25000 lines each one consisting in 4912 samples. The pixels are complex and coded in 
2 + 2 bytes, resulting in about 500 MB. 
* This work has been supported by the EU, PCI-II ESPRIT project 21037 
1347 
The data volumes involved in SAR processing are high. This fact along with the 
increased use of SAR data and the advent of applications requiring near real time 
response (ie. oil pollution monitoring), make attractive the application of HPCN 
technology. An EUROPORT project is involved in SAR processing; its objective is the 
porting of a library of functions for the analysis, not the generation, f SAR images. 
The focusing of SAR raw data is essentially a 2-D correlation of the input signal with 
the SAR Impulse Response Function [3]. Classical methods of SAR processing 
implement the signal compression i the frequency domain. The SAR processor used in 
the project is called Chirp Scaling Algorithm, CSA [4]. The CSA involves only FFT and 
multiplications. The structure of the CSA is relatively simple, consisting on a sequence 
of 1-D FFT and matrix element by element products. 
3 Design of the Parallelisation 
The parallelisation strategy that has been implemented is called Multi-Block Approach, 
MBA. It consists on dividing the input data into independent processing blocks; each 
block is fully processed in one host, resulting in a small piece of the final image. MBA 
can be seen as a coarse grained parallelisation strategy. PVM, Parallel Virtual Machine, 
is used to control the whole process. 
The main advantage of MBA is the full independence of the different asks split 
among the available hosts. Another interesting point of MBA is the minimization of both 
the number of I/O operations and the amount of data to be transferred across the cluster. 
The implementation f the MBA parallelisation is relatively easy. Each host in the 
cluster has a CSA processor, very similar to the sequential one. So, improvements in the 
sequential code are readily portable to the parallel software. On the other hand, the 
drawback of MBA is the correlation efficiency, that strongly depends on the size of the 
processing data blocks and hence, on the available RAM in each host. 
The flow char of the parallel SAR processor is shown in figure 1. There is a main 
process in charge of launching different tasks in the computers of the network and 
managing the execution of them. There are three types of tasks or slaves processes: 
- cutter. This task is responsible of reading the raw data file and generating the different 
data blocks that will be sent to the nodes. 
- child. This is a simplified version of the CSA processor, that reads a data block and 
produces a small piece of the final image (imagette). 
- builder. The builder reads the different pieces of the image and assembles the final 
SAR image file. 
Both the raw data blocks and the imagettes generated by the processors are stored 
in temporary files. The use of temporary files can be seen as a drawback; however, tests 
conducted without emporary files showed the importance of disk access conflicts when 
different child processes read from and write to the same large files. Furthermore, disk 
operations by the child processes are by direct access implying high inefficiency. 
The parallel code starts by generating a set of data blocks that are assigned to the 
available hosts. The parent process continuously examine the status of the hosts in the 
network. When one host is free, the parent assign a new block. The priorities in the main 
process are: 1. Cutter process, to ensure that there are data blocks available for the child 
processes; 2. CSA processes and 3. Builder process. 
1348 
/ 
+ - +  
,,, 
Fig 1. Flow char of the parallel processor 
With this strategy, the disk bottleneck problems arealleviated by imposing that the 
different slave processes never access to the same file, and most disk operations can be 
carried out by using the more efficient sequential access. The number of hosts to be used 
in the "parallel machine" can be selected, as well as the tasks to be conducted by each 
node. 
4 Preliminary Results 
The classical parameters o estimate the performance of the parallel code, speed up factor 
and efficiency, are not readily portable for a heterogeneous cluster, as each node has its 
own processing time. We have used an alternative definition for these parameters that 
is intuitive and makes some sense for a heterogeneous set of computers: 
- Efficiency: ratio of the number of standard products generated by the parallel code to 
the number of products generated by the sequential code running in all the computers 
of the cluster during the same time. 
- Speed up: the efficiency times the number of computers in the cluster. 
The results of the parallel processor running in different configurations are listed in 
the next tables, along with the characteristics of the workstations in the clusters and the 
processing time of the sequential processor in each workstation. 
The fin'st est used a block size of 32 MB (note that 2 of the computers in the cluster 
have 64 MB RAM). The main, cutter and builder processes were run in HP-720, so that 
this host is fully dedicated to data handling. The remaining two computers were in 
charge of SAR processing. The efficiency of the parallel code is 0.64, with a speed up 
factor of 1.93. 
1349 
MODEL Clock Rate RAM Proc. Time 
HP-735 99 MHz 96 MB 150 min 
HP-720 50 MHz 64 MB 320 min 
HP-715 50 MHz 64 MB 320 min 
CLUSTER 120 rain 
Table 1. Processing time in UPC cluster. Processing block 32 MB 
The results of the tests at INDRA are presented in table 2; now, a block size of 64 
MB was used; the slowest host executed the main, cutter and builder processes There 
are two test cases: 
- Case 1: Only the 3 Sun computers were used. The processing time is 58 minutes, 
resulting in a efficiency of 0.68, and a speed up factor of 2.03. 
- Case 2: All the computers are used. The processing time is 46 minutes, resulting in a 
efficiency of 0.56, and a speed up factor of 2.24. 
MODEL Clock Rate 
HP C160-L 160 MHz 
Sun Ultra 1 167 MHz 
Sun S-20 75 MHz 
Sun S-10 40 MHz 
3 Sun WSs 
All hosts 
RAM Proe. Time 
128 MB 75 min 
128 MB 75 min 
128 MB 135 min 
128 MB 210 min 
- 58 min 
- 46 min 
Table 2. Processing time in INDRA cluster. Processing block 64 MB. 
The results of the tests show that the parallel processor works relatively well in a 
cluster of three workstations. The efficiency figures obtained in the clusters at UPC and 
INDRA are equivalent. However, when including an additional host to the cluster, the 
efficiency decreases. This is due to the fact that the computers performing SAR 
processing are faster than the cutter, so that they have to wait for data blocks to process. 
Work is currently on-going to upgrade and fine tune the performance of the parallel 
code. In particular, we may mention the following points: 
- Optimization of the I]O operations, (cutter process). This should allow the use of more 
workstations in the duster without loss of efficiency. 
- Allowing the size of the processing block to be fitted for each host, so that computers 
with different RAM can be simultaneously used. 
1350 
5 Conclusions 
The fwst activities in the porting of a sequential SAR Processor to a parallel architecture 
have been performed. The parallel software is flexible and portable, so that it can be 
installed in most user sites. The preliminary results obtained with the parallel code are 
encouraging, and show a decrease in the processing time of the parallel code with 
respect o the sequential one. Good results were obtained with a cluster of three 
workstations. Additional work is on-going to enhance and fine-tune the code, so that the 
efficiency of the parallel code can be kept constant when adding more hosts to the 
cluster. 
References 
1. Elachi C.: Spaceborne Radar Remote Sensing. IEEE Press, 1988. 
2. ESA.:ESA ERS-1 Product Specifications. ESA SP-1149, 1992. 
3. Curlander J.C. & McDonough R.N.: SAR: Systems and Signal Processing. John 
Wiley & Sons, 1991. 
4. Raney R.K., Runge H., Bamler R., Cumming I.G., Wong F.H.: Precision SAR 
Processing Using Chirp Scaling. IEEE Trans. Geosci. Remote Sensing (1994) Vol. 
32 pp. 786-799. 
