Signal processor architecture for backscatter radars by Johnston, P. & Swartz, W. E.
513 
8.3C SIGNAL PROCESSOR ARCHITECTURE FOR BACKSCATTER RADARS 
W. E. Swartz and P. Johnston* 
*School of Electrical Engineering, Cornel1 University, I thaca,  NY 14843 
*WOAA, Poker F l a t  MST Radar, P. 0. Box 80128, College, AK 99708-0128 
ABSTRACT 
R e a l - t i m e  s ignal  processing f o r  backscatter radars requires  enormous 
computational throughput and 110 rates; however, the operations t h a t  are usually 
performed i n  real t i m e  are highly r e p e t i t i v e  simple accumulations of samples o r  
of products of samples. Furthermore, s ince the control logic  does not  depend on 
t he  values of the data,  general-purpose computers are not required f o r  the 
i n i t i a l  high-speed processing. The implications of these f a c t s  on the  
a rch i t ec tu re s  of preprocessors f o r  backscatter radars  are explored and applied 
t o  the design of the Radar Signal Compender. 
The Radar Signal Compender i s  a programmable high-speed pipelined real-time 
multiprocessor machine intended f o r  coherent and incoherent backscatter radars. 
I t s  a rch i t ec tu re  lends i t s e l f  t o  t ime-crit ical  processing where the operations 
performed a r e  only the d i r e c t  accumulations of samples or the accumulations of 
products of the o r ig ina l  samples. The programmability o f  t h i s  machine allows i t  
t o  be adapted t o  a wide range of experiments, yet  without the d i f f i c u l t y  usually 
found with more general-purpose array processors. 
several  Functional Modules which p a r a l l e l  process mult iple  data  streams, a 
Master Control Module which provides f o r  timing and communication between the  
host computer and each of the Functional Modules, and an Analog-to-Digital 
Conversion Module which feeds samples d i r ec t ly  i n t o  the input memories of the 
Functional Modules under the control  of external  timing logic. 
Functional Modules can be individually programmed under the control  of the Host 
Computer and the Master Control Module. Control of each of the data-processing 
pipel ines  i s  nearly transparent t o  the user, i n  t ha t ,  control  operands a re  
tagged t o  the sample address operands and then follow the processing through a 
control  pipeline f o r  use a t  the proper stage. 
f u l l y  double buffered fo r  most usual configurations, and a l l  memories are 2 k 
words deep. The four input memories of each Functional Module are 16 b i t s  wide, 
while the four output memories are each 32 b i t s  wide and can be configured as 
two 64-bit wide memories. 
The Compender i s  composed of 
Each of the 
Input and output memories are 
Programming the device consis ts  of the loading of the configuration 
r e g i s t e r s  and the address control RAMs of each Functional Module using simple 
d i r ec t ives  t o  the Master Control. 
data flow paths t h a t  are uniquely determined f o r  a given experiment. 
address control  RAMs consist  of BASE plus DISPLACEMENT operands with f l e x i b l e  
incrementing and looping control.  
The configuration r e g i s t e r s  e s t ab l i sh  the 
The 
A Compender with 10 Functional Modules and high-speed memories should be 
capable of a throughput of 100 MHz for  multiply-replace-add sequences. The more 
modest vers ion fo r  the Poker F l a t  MST radar  with 6 Functional Modules and slower 
memories achieves a 3O-MEz throughput. 
INTRODUCTION 
Since the s ignals  received from backscatter radars are noise l i ke ,  the 
basic  requirement of the  processing hardware i s  t o  average as many samples as 
possible i n  as short  a time as possible. For some experiments, the 
computational l imitat ions r e s t r i c t  only the amount o r  qua l i t y  of the real-time 
displays t h a t  can be generated. 
between w h a t  can be done i n  real-time versus what must be done off-line. Y e t  
For o the r s  t h e  l imi t a t ions  i s  a trade-off 
https://ntrs.nasa.gov/search.jsp?R=19840019096 2020-03-22T09:46:02+00:00Z
514 
fo r  many experiments, the actual  science i n  terms of height resolution, t i m e  
resolution, number of heights, b i a s  corrections,  or dynamic in t e rac t ion  is  
limited by insuggicient compute power. 
The most popular atmospheric backscatter radar experiments can be s p l i t  
between four  major headings, a s  shown i n  Table 1. Since the correlat ion times 
of the medium being probed under each of the headings d i f f e r  from one another, 
various transmitter pulse and receiver sampling schemes are used t o  optimize a 
given experiment. 
i s  a highly redundant sequence of additions of samples o r  of products of 
samples. 
ments i n  terms of the r a t e  of multiply-replace-add operations. It i s  obvious 
t h a t  even state-of-art general-purpose array processors with s ingle  mul t ip i l i e r s  
and adders cannot keep up f o r  experiments requiring rates of more than j u s t  a 
few Megahertz. Remember too t h a t  commercial array processors use floating-point 
formats, ye t  integer ari thmetic i s  su f f i c i en t  provided the data  paths are wide 
enough t o  avoid truncation of the summations. Floating-point formats can lead 
t o  subt le  biases and j u s t  the conversion from the integer outputs of the analog- 
to-digi ta l  converters can be a bottleneck within the processor. Integer logic  
i s  s impler  and f a s t e r ;  hence, i t  should be preferred f o r  the preprocessors used 
with backscatter radars. 
However, i n  every case the i n i t i a l  real-time f a s t  processing 
The bottom two l i n e s  give a comparison of the computational require- 
Table 1. Signal processing requirements 
- NST E REGION P REGION PROTOIIOSPEEBE 
Interpulae 
period 
(me E ) 
Pulse Width 
(usee) 
Number of 
Pulses per 
IPP 
Coding 
Number of 
Bauds 
S a p l i n g  
Bate tmz) 
Number of 
complex 
Products 
per Smplc 
(1) 
Number of 
k g s  
Number of 
Heights 
Bate for 
Hultiply- 
Replace- 
Adds (mz) 
0.5-1 .O 
0.14.0 
1 
Variotls 
1-256 
1-20 
1 
200-2000 
100-1000( 2) 
0.2-100 (2) 
2-10 
2-4 
1-7 
Possibly 
Barker 
7-13 
0.25-0.5 
1 
10-20 
20-600 
4-50 
0.04-1.3 
10-15 40 
4-300 1000 
' 1-7 1 
Not 
Usually 
13 
0.05-0.5 
1-50 
No 
0.5 
400 
10-100 30 (60 i f  ACP 
is formed 
at  IF) 
20-1000 20 minimum 
100 200 (3) 
0.06-3 20 ( 4 )  
NOTES: (1) Multiple products a re  independent only when s ignal  t o  noise is 
low. 
complex products given. 
The number of real products is four times the number of 
(2) Rate f o r  addition6 Only - mult ipl ies  not required a t  t h i s  level.  
(3) Rate fo r  unbuffered case. 
( 4 )  Rate fo r  double buffered case. 
515 
The order (i.e., addressing) of the samples sent  t o  the processor and the 
ordering of the processed data output t o  the host computer can be very simple. 
In  f a c t ,  there  i s  never any need fo r  the addressing of these two t r ans fe r s  t o  be 
anything but sequential .  For experiments requiring pulse decoding o r  multiple 
lag products, the addresses of samples being supplied t o  the processing stages 
are s t i l l  highly r epe t i t i ve ,  but not completely sequential;  more w i l l  be said 
about t h i s  l a t e r .  
FUNCTIONAL OVERVIEW 
With these ideas i n  mind, one can eas i ly  write a block diagram showing the 
This has been done i n  
data  flow for a simple s igna l  processing example where the samples a r e  simply 
accumulated before being passed on t o  the host computer. 
Figure 1. 
be as f l e x i b l e  as required by a given experiment. 
buffer t o  temporarily s t o r e  the samples, each accumulation must be accomplished 
within the sample interval .  I f  the same sample i s  used f o r  several  accumula- 
t i ons  (a very typical  s i t ua t ion ) ,  then the  t i m e  needed fo r  mult iple  fe tches  and 
s to re s  t o  the memory, plus the t i m e  for  the accumulations, soon exceed the 
sample in t e rva l  t i m e ,  even for  the f a s t e s t  l og ic  available.  Many such un i t s  
could be paral le led together,  but one immediately r ea l i zes  t h a t  typical  radar 
appl icat ions have a s ignif icant  amount of t i m e  between the end of one sample 
r a s t e r  and the start  of the next r a s t e r .  
between the ADC and the accumulator would then allow t h i s  e x t r a  t i m e  t o  be 
u t i l i zed ,  a t  l e a s t  pa r t i a l ly .  
The addressing of the Output Memory a t  t h i s  l w e l  can be assumed t o  
Since there  i s  no input 
The addi t ion of a buffer memory 
With a s ingle  memory between the ADCs and the accumulator, the next 
bottleneck arises when the ADC wants t o  w r i t e  a sample to  the memory a t  the same 
t i m e  a s  the accumulator wants t o  read some other sample. 
s i t u a t i o n  i n  general-purpose processors even with double buffering. ( A l l  t h a t  
double buffering a l l e v i a t e s  i s  the problem of guaranteeing the v a l i d i t y  of the 
data before i t  i s  over-written with the next sampling sequence, assuming t h a t  
the processing keeps up.) 
two independent input buffers,  where one can be writ ten,  while the other is 
being read. 
of the two buffers i s  a l so  independent, then sampling can proceed a t  the maximum 
rate allowed by the memory with no need t o  w a i t  f o r  the multiple memory accesses 
t h a t  may be required fo r  processing. 
This would be the 
This bottleneck can only be eliminated by the use of 
This configuration i s  i l l u s t r a t e d  i n  Figure 2. I f  the addressing 
Final ly ,  Figure 3 i l l u s t r a t e s  the data paths required f o r  maximum through- 
put when a mul t ip l i e r  i s  in se r t ed  within the data process stream. 
t h i s  case shows four Data Input Buffers. Four buffers a r e  needed, even fo r  the 
case where the samples loaded i n t o  each memory are the same, but where the 
mult ipl icat ions are formed between samples taken a t  d i f f e ren t  times (e.g., f o r  a 
Lag product of an autocorrelat ion function).  
sidered as two independent double buffers,  each supplying one of the 
multiplicands. 
memories f o r  each mult ipl icat ion.  Of course, t h i s  assumes t h a t  the memory f e t ch  
time i s  comparable t o  the multiply t i m e ,  which, i n  pract ice  with current 
technology, turns  out t o  be true. ( I f  the memories were twice as f a s t  as  the 
mul t ip l i e r ,  so t h a t  a double f e t c h  could be accomplished i n  one cycle, then only 
two memories would be needed. ) 
Note tha t  
These four buffers  should be con- 
I n  t h i s  way, only one memory f e t ch  i s  needed from each of two 
The Radar Signal Compender i s  composed of several  Functional Modules 
(FMs), a Master Controller (MC), an Analog t o  Digi ta l  Conversion module (ADC), 
and su i t ab le  interfacing t o  a host computer, as  shown i n  Figure 4. 
the ADC i s  f ed  d i r ec t ly  t o  the FMs which perform the data processing. I n  order 
t o  provide f l e x i b i l i t y ,  the host computer can separately program each FM. 
Programming includes the s e t t i n g  of the Configuration Register (which specif ies  
which data processing paths are t o  be used, thereby, determining the data word 
Data from 
516 
ri? ROST COMPUTER 
Figure 1. Figure 2. 
s i z e  and whether or not the mul t ip l i e r  i s  t o  be by-passed) and includes the 
loading of the operands t h a t  control the addressing of the Data Input Buffers on 
the FMs. The l a t t e r  i s  described, i n  d e t a i l ,  i n  a later section. 
Data flow within one of the FMs i s  generally as i l l u s t r a t e d  i n  Figure 2 or  
3 where each block may represent  several  stages i n  the pipeline.  The processing 
pipeline i s  actual ly  9 or 7 stages long and uses e i t h e r  23 or 18 cycles of the  
master clock, depending on whether the mul t ip l i e r  i s  used or by-passed, 
respectively.  New data can be s tuffed i n t o  the pipeline every 5 cycles of the 
master clock. The data  paths can be up t o  64 b i t s  wide, or s p l i t  up i n t o  as 
many as four  16-bit-wide paths fo r  mult iple  independent p a r a l l e l  processing 
within each Functional Module. 
where the ex t r a  guard b i t s  are not needed. 
the dual 32-bit-wide path configuration, while the incoherent-scatter 
appl icat ions use e i t h e r  the 48-bit or 64-bit configurations. 
f igurat ion,  the carry b i t s  a r e  appropriately propagated and any overflow 
conditions flagged. 
This f ea tu re  i s  pa r t i cu la r ly  useful i n  MST work 
The Poker F l a t  MST radar w i l l  use 
For each con- 
The Functional Modules have been wired on 11" x 16" boards using a s e m i -  
automatic wire-wrapping service.  
carry data  or  are part  of one of the address busses. 
l e f t ,  much of the combinational logic  required t o  control the FMs w a s  placed i n  
various PAL (Programmable Array Logic) c i r c u i t s  t h a t  must be specially 
programmed fo r  the RSC. 
Most of the ZOO-plus I C s  on each FM e i t h e r  
Since l i t t l e  space was 
An addi t ional  f ea tu re  t h a t  had high p r i o r i t y  i n  the design was the 
provision fo r  automatic test features .  Each of the memories (including the Data 
Input Buffers, t he  Base and Displacement Operand Menories, and the Output Data 
517 
Figure 3. 
Memories) can be loaded with test  data  from the host or MC and then read back 
out  again t o  check memory and data buss in t eg r i ty .  
can be s ing le  stepped t o  allow probing each stage of the pipeline.  
The Master Control modules are somewhat dependent on the  host t o  be used 
with the system. Differences arise from d i f f e r e n t  110 buss widths, handshaking, 
and the number formats (pa r t i cu la r ly  i n  the integer  t o  floating-point converters 
t h a t  are included). Control functions are generated and controlled by an on- 
board 280 microprocessor. 
ADDRESSING OF THE PROCESSOR INPUT BUFFERS 
Also, the multiphase clock 
Although sequential  addressing of the Data Input Memories i s  possible 
during r a w  data input from the ADCs, a random addressing scheme must be provided 
f o r  reading the  data  back out f o r  sample processing. 
and lag produce calculat ions (which are the most complicated cases) the 
addresses can be formed as the sum of two operands -- one based on a given 
sample referenced t o  a spec i f i c  range, and the other  determined a s  a r e l a t i v e  
displacement to the  other samples that contribute t o  the calculat ion of the de- 
s i r e d  quan t i ty  f o r  t h a t  range. This i s  simply a nested loop s t r u c t u r e  where the 
outer loop indexes the  range and where the inner loop indexes the terms t h a t  
For both pulse decoding 
518 
BLOCK DIAGRAM FOR A SYSTEM USING THE RADAR 
SIGNAL COMPENDER ANALOG SIGNALS 
MALO0 SIGNALS 
I 1 1  
Figure 4. 
contr ibute  t o  t h a t  range. 
Displacement operands fo r  the Data Input Buffers from sequential  locations i n  
th ree  operand memories. 
accomplished. 
The Radar Signal Compender obtains such Base and 
Figure 5 diagrams how t h i s  addressing scheme i s  
Each s tage i n  the address computations i s  a l s o  pipelined t o  maximize the 
speed. 
generation using the Base and Displacement operands is applied t o  one buffer of 
each Data Input Buffer pa i r  €or data processing while a separate counter 
provides sequential  addresses t o  the remaining buffers fo r  data input from the 
ADCs. Since the 110 busses, the Data Input Buffers and t h e i r  addresses a r e  a l l  
independent, no memory cycles are l o s t  from the processing f o r  the 11'0 
t r ans fe r s .  Select ion of the opposite buffer requires  only a change i n  the state 
of a control  l i ne ,  a change t h a t  t akes  only a f r ac t ion  of one microsecond t o  
accomplish. Hence, the e n t i r e  t i m e  is  avai lable  f o r  processing the data. This 
i s  a tremendous advantage over the s i t ua t ion  i n  general-purpose processors which 
must give up memory cycles even fo r  double buffered 110. Separate Displacement 
operands are provided f o r  the l e f t  and r i g h t  Data Input Memories so t h a t  samples 
taken a t  d i f f e ren t  times can be selected f o r  the mult ipl ier  t o  create  the l ag  
products of an autocorrelat ion function (ACF). Only the lower 11 b i t s  of the 
Base and the two Displacement RAMS (which are 2 k words deep) a r e  used fo r  
address generation; the remaining 5-bits are used f o r  process and address 
counter control.  Note t h a t  the Base Address Computer generates the address f o r  
the Base Operand Memory, while the Displacement Address Counter generates a 
common address f o r  both Displacement Operand Memories. 
CONCLUSIONS 
(Other more general  address schemes were not f a s t  enough.) Address 
The basic  a rch i t ec tu re  of the Radar Signal Compender has been i l l u s t r a t e d  
with respect  t o  the very spec i f i c  high-speed real-time signal  processing 
519 
COUNTER 
CONlROL 
1 1  
requirements of 
a b l e  i n  the  RSC 
91s 
OPERAND m O R T  
ADDRESS WUWrm 
BLOCK DIAGRAM FOR ADDRESSING INPUT DATA 
FOR SIGNAL PROCESSING 
D I S PL X mE iiT 
OPERAND llMORI 
ADDRESS CWMER 
t 
FIRST DISPLACEWENT SECC!9 DISPLACEMEHT 
OPERAND NMORT OPERANR .-OR1 
t, I 
LEFT BUFFER R Z O B T  BUFFEE 
Figure 5 .  
backsca t te r  radars .  A f u l l  t echn ica l  d e s c r i p t i o n  w i l l  be ava i l -  
users  manual. The major f ea tu res  of the RSC a r e  l i s t e d  below 
(1) 
each of which i s  f u l l y  p ipe l ined  and programmable f o r  maximum throughput and 
f l e x i b i l i t y  . 
( 2 )  
independent of 110. 
( 3 )  
processing wi th in  the  RSC. 
( 4 )  
S u f f i c i e n t  guard b i t s  can be chosen t o  avoid overflows f o r  even very long 
i n t e g r a t i o n s ;  even so, e r r o r  checking f o r  overflows i s  provided. 
(5) 
of weighting f a c t o r s  f o r  o f f - l i ne  ana lys i s .  
Mul t ip le  Functional Modules provide many p a r a l l e l  data-processing streams, 
Multiple Independent Data Input Buffers allow processing t o  be completely 
Addressing i s  sequent ia l  f o r  1/0 with the  RSC, but i s  f l e x i b l e  f o r  
In t ege r  processing i s  used wi th  user s e l e c t a b l e  da ta  path widths. 
F u l l  m u l t i b i t  mu l t ip l i ca t ions  reduce b iases  and simplify the  computation 
Other uses ,of  the  RSC are envisioned. For example, s ince  the  Input Data 
Memories can be loaded d i r e c t l y  from the  host computer as w e l l  as from the ADCs, 
t he  device can also be used as an in t ege r  a r ray  processor f o r  o f f - l i ne  ana lys i s  
of much of our work t h a t  begins wi th  Fourier transforms of l a r g e  moun t s  of r a w  
da ta .  
t e s t i n g  of t he  RSC.) Other poss ib l e  conf igura t ions  have been considered where 
the  output of one RSC was f e d  i n t o  another RSC f o r  two-stage processing of the  
data.  Eventually i t  may be des i r ab le  t o  s u b s t i t u t e  f loa t ing-poin t  a r i t hme t i c  
u n i t s  f o r  t he  in t ege r  u n i t s  where g rea t e r  dynamic range i s  necessary f o r  a r r ay  
manipulations.* Note, however, t h a t  t he re  i s  no reason t o  go t o  f loa t ing-poin t  
a r i t hme t i c  f o r  just: t h e  i n i t i a l  real-time processing of backsca t te r  radar  data. 
(This d i r e c t  d a t a  load f e a t u r e  was o r i g i n a l l y  developed f o r  automatic 
