Systolic arrays for matrix I/O format conversion by Petkov, N.
  
 University of Groningen




IMPORTANT NOTE: You are advised to consult the publisher's version (publisher's PDF) if you wish to cite from
it. Please check the document version below.
Document Version
Publisher's PDF, also known as Version of record
Publication date:
1988
Link to publication in University of Groningen/UMCG research database
Citation for published version (APA):
Petkov, N. (1988). Systolic arrays for matrix I/O format conversion. Default journal.
Copyright
Other than for strictly personal use, it is not permitted to download or to forward/distribute the text or part of it without the consent of the
author(s) and/or copyright holder(s), unless the work is under an open content license (like Creative Commons).
Take-down policy
If you believe that this document breaches copyright please contact us providing details, and we will remove access to the work immediately
and investigate your claim.
Downloaded from the University of Groningen/UMCG research database (Pure): http://www.rug.nl/research/portal. For technical reasons the
number of authors shown on this cover page is limited to 10 maximum.
Download date: 12-11-2019
SYSTOLIC ARRAYS FOR MATRIX 1/0 
FORMAT CONVERSION 
Indexing terms: Computer applications, Systolic arrays, 
Matrix algebra, I /O format conversion 
New 1-slow and 2-slow systolic arrays for row-to-diagonal 
and diagonal-to-row input/output format conversion of 
matrices are given. For n x n matrices, the arrays consist of 
nz cells. A 1-slow systolic array performs the conversion in 
2n - 1 periods. The arrays can be used also for column-to- 
diagonal, diagonal-to-column, row-to-column and column- 
to-row input/output format conversions. The 1-slow array 
performs these operations in 2n - 1, 2n - 1, and 3n - 1 
periods, respectively. 
Systolic arrays for several important matrix operations have 
been a matter of wide interest in the last few years.'-5 The 
matrices involved are input/output in different formats. We 
refer to an input format in which the elements of one matrix 
row are input one after another in the same array input as 
row input format. (Different matrix rows can be input in dif- 
ferent array inputs.) Column and diagonal input/output for- 
mats are defined by analogy. An orthogonal systolic array' for 
the matrix operation C = A . B + 0') is a representative 
example in which all three (row, column, and diagonal) 1/0 
formats are used. Systolic algorithms for 1/0 format conver- 
sion have been proposed by OLeary.6 In this letter, new 
systolic arrays for row-to-diagonal and diagonal-to-row 1/0 
format conversion are presented. They can also be used for 
column-to-diagonal, diagonal-to-column, row-to-column, and 
column-to-row 1/0 format conversion. 
The arrays given make use of a switch cell shown in Fig. 1. + c$ = c ,  
C i f  c then begln x '  = y ,  y x'end 
else begln x '  5 x , y '  = y end 
Y 
[051111 
Fig. 1 Array cell 
It is a combinational circuit taking in matrix components x 
and y from the left and from below, respectively, and a control 
bit c from the right. If c is zero, then x is output to the right 
and y up. If c = 1, then x is output up and y to the right. 
A systolic array for row-to-diagonal format conversion of 
3 x 3 matrices is shown in Fig. 2. The small black boxes 
denote clocked delay elements. In this particular algorithm, 
the zero control bits which follow the one control bits propa- 
gating from right to left have no influence on the compu- 
tational processes, but they are provided to prepare the array 
for a subsequent computation. (Notice that at the beginning of 
the computation, a zero control bit should reside in the 
second cell of the last array row and in the last cell of the 
second row, in order that the array functions properly. These 
zero bits can be considered as 'a nonobligatory rest' from a 
_ - _ _ - - _  - - - - - - -  . a3Z . 
I 
I 
' a 2 3  . '22 
data at time 
previous computation.) The reader is invited to check the cor- 
rectness of this and the other algorithms given below by simu- 
lating the array's work for several clock periods. For n x n 
matrices, the conversion requires n2 cells and 3n - 2 clock 
periods. The array given in Fig. 2 is classified as a 2-slow 
circuit (after the terminology of Leiserson et a[.),' and can 
readily be transformed into a 1-slow circuit by removing every 
second delay element in each directed interconnection path 
(Fig. 3). The data input should be changed, the most signifi- 
cant change being the removal of the idle periods between 
(output data at time t.2n-1) 
(input data at time t )  (input data at time t ) 
1051131 
Fig. 3 1-slow systolic array for row-to-diagonal format conversion 
(n = 3) 
consecutive matrix elements. The 1-slow systolic array for 
row-to-diagonal conversion of an n x n matrix consists of n2 
cells and performs the conversion in 2n - 1 clock periods. 
Fig. 4 shows the 2-slow systolic array from Fig. 2, this time 
in its function as diagonal-to-row format convertor. The con- 
version requires 3n - 2 clock periods. The same input/output 
operations can be applied to the 1-slow systolic array shown 
in Fig. 3, yielding a (1-slow) systolic algorithm for diagonal-to- 
row format conversion which requires 2n - 1 clock periods 
(not shown). 
The 1-slow and 2-slow systolic arrays given above can also 
be used for column-to-diagonal and diagonal-to-column 
format conversion. For example, a systolic algorithm for 
column-to-diagonal format conversion can be obtained from 
the corresponding row-to-diagonal systolic algorithm by 
inputting columns B, ,  B,, ..., B, instead of rows A,,  A,, ..., 
A,, respectively, and outputting diagonals D,- ,, . . . , D, --I 
instead of diagonals D, -", . .., D,_ ,, respectively. (Dk, 
k = 1 - n, . . ., n - 1, denote diagonals consisting of those 
matrix elements aij for which holds j - i = k, respectively.) 
The arrays given can further be used for row-to-column and 
column-to-row 1/0 format conversion. For example, a row-to- 
column conversion is performed by performing first a row-to- 
diagonal conversion and then a diagonal-to-column conver- 
sion. Using the 1-slow systolic array given above (Fig. 3), each 
of these operations can be carried out in 2n - 1 clock periods. 
However, the last n periods of the first operation can be over- 
(output data at time t + 3n - 2  ) 
/ -  - -  - - - - - - -7 
/ all a21 
/ 
/ a12 "22 a32 / 




Fig. 2 2-slow systolic array for row-to-diagonal format conversion (n = 3) 
ELECTRONICS LETTERS 23rd June 1988 Vol. 24 No. 13 
- 





lapped with the first n periods of the second operation, SO that 
the row-to-column conversion can be performed in 3n - 2 
clock periods. & N - 7 ; ;  / ; 5
’ 1  O / /  - - - - - - - 
7 - - - - - -  
* / ’ a l l  ‘ 2 1  a31 ’ 
/ /  at time t )  
* /gl: a22 Q 3 2 / /  
/ 





all I . _  - 
(outpui data at time t .3n - 2 )  
10511L] 
Fig. 4 2-slow systolic array for diagonal-to-row format conversion 
(n  = 3) 
Next, the performance of the (1-slow) algorithms given 
above is compared with alternative designs. OLeary6 has 
given three different systolic arrays for row-to-column, 
row-to-diagonal, and diagonal-to-row format conversion, 
respectively. They perform the conversion in 3n - 1, 4n - 1, 
and 2n - 1 clock periods, respectively, while the 1-slow systol- 
ic array given above requires 3n - 2,  2n - 1, and 2n - 1 clock 
periods, respectively. The 1-slow systolic array given above 
has the following advantages: one and the same array is used 
for all conversions as opposed to the three different arrays 
used by O’Leary; only one cell type is used as opposed to the 
three cell types for the diagonal-to-row conversion array given 
by O’Leary; all 1/0 operations take place in boundary cells as 
against 1/0 operations at internal cells (a major disadvantage 
for VLSI implementation) for the array of O’Leary mentioned 
last; the row-to-diagonal conversion is twice as fast. 
N. PETKOV 22nd February 1988 
International Basic Laboratory 
Central Institute for Cybernetics & Information Processes 
Kurstr. 33, PO Box 1298, Berlin 1086, German Democratic Republic 
for Image Processing & Computer Graphics 
References 
1 BRENT, R. P., KUNG, H. T., and LUK, F.  T.: ‘Some linear-time algo- 
rithms for systolic arrays’. Proc. IFIP 9th World Comput. Cong., 
Sept. 1983, pp. 865876 
2 GENTLEMAN, w. M., and KUNG, H. T.: ‘Matrix triangularization by 
systolic arrays’. SPIE, Vol. 298, Real Time Signal Processing IV, 
KUNG, H. T., and LEISERSON, c .  E.: ‘Systolic arrays (for VLSI)’. 
Sparse Matrix Proc. 1978, Society for Industrial and Applied 
Mathematics 1979, pp. 256282 
4 PREPARATA, F. P., and VUILLEMIN, I.:  ‘Optimal IC implementation 
of triangular matrix inversion’. Proc. Int’l. Conf. Parallel Pro- 
cessing, 1980, pp. 211-216 
5 HWANG, K., and CHENG, Y. H.:  ‘VLSI computing structure for 
solving large scale linear systems of equations’. Proc. Int’l. Conf. 
Parallel Processing, 1980, pp. 217-227 
O’LEARY, D. P.: ‘Systolic arrays for matrix transpose and other 
reorderings’, l E E E  Trans., 1987, C-36, pp. 117-122 
LEISERSON, c .  E., ROSE, F. M., and SAXE, J. B.:  ‘Optimizing synchro- 
nous circuitry by retiming’. Proc. 3rd Caltech Conf. VLSI, R. 
Bryant (Ed.) (Rockville MD, Computer Sci. Press, 1983), pp. 





PERFORMANCE ANALYSIS OF CA-CFAR IN  
THE PRESENCE OF COMPOUND GAUSSIAN 
CLUTTER 
Indexing terms: Radar, Clutter, CFAR, Radar receivers 
Exact expressions are derived for the ROCs of a CA-CFAR 
radar processor subject to both internal noise and clutter (the 
last one modelled as compound Gaussian). They include pre- 
viously known ROCs, e.g. for ideal CFAR. The effect of finite 
sample size is elicited in the example of K-distributed clutter. 
Non-Gaussian clutter is of primary concern for high- 
resolution radars. Deviations from the Gaussian statistics may 
be considered to arise from the concurrence of two reflectivity 
phenomena: the first one is a slow fluctuation of backscatter 
coeficient induced by physical variations of the scanned 
domain; the second one is a superimposed fast fluctuation for 
fixed backscatter coefficient. If the resolution cell is small 
enough the radar return is conditionally Gaussian 
(conditioned on a fixed illuminated patch), but its overall dis- 
tribution may be non-Gaussian since it is affected by the sta- 
tistics of the slow fluctuation. This is the so-called composite 
scattering model.’ 
Correspondingly a mathematical model for the complex 
envelope ?(t)  of the clutter is the so-called compound Gauss- 
ian model: 
2.( t )  = a?( t )  (1) 
where ?(t), the ‘speckle’ component, is a Gaussian complex 
process with zero mean and variance u* accounting for the 
fast fluctuation; a, the ‘spiky’ component, is a random vari- 
able, independent of n(t), whose distribution fits the overall 
statistics of the slow fluctuation. In considering a as a random 
variable instead of a random process the assumption is made 
that the illuminated patch is small enough that the time slow 
fluctuation can be neglected. To achieve constant false 
alarm rate in such a time-varying and nonhomogeneous 
environment some sort of adaptivity is required. We focus on 
the adaptive thresholding technique known as cell averaging 
(CA). The CA-CFAR basic scheme is shown in Fig. 1 with 
I I 1 ’ 1  1 ;  comparator 
averager T I  I 
L 
I l 2 & - J  
Fig. 1 Schema of cell averaging constant false alarm rate processor 
reference to averaging in range. Following square-law detec- 
tion, echoes from a number of cells surrounding and including 
the cell being probed are used to estimate the background 
noise, so that the dstection threshold can be adapted accord- 
ingly. 
Assuming that the useful signal s(t)  fluctuates from pulse to 
pulse according to a Swerling model and considering receiver 
noise %(t) in addition to clutter, the hypothesis testing problem 
can be formulated as follows: 
H , :  rft) = %(t)  + a?(t) 
H I :  F(t) = %(t) + a%(t) + i(t) 
where ii(t), %(t) ,  and a(t)  are complex zero-mean Gaussian vari- 
ables with variances ui ,  05 and uf respectively and a has unit 
RMS. 
The performance of this basic scheme has been previously 
considered in References 2 and 3 for the case that the clutter 
envelope is K-distributed. The assumption is made there that 
the speckle component is completely averaged out by the CA 
process or, equivalently, that the spiky component is com- 
pletely known: this is the ‘ideal CFAR’ since it implies con- 
stancy of the clutter (constant a) in an arbitrary large sample 
set. Following such an approach, the operating characteristics 
782 ELECTRONICS LETTERS 23rd June 1988 Vol. 24 No. 13 
