Pipeline synthetic aperture radar data compression utilizing systolic binary tree-searched architecture for vector quantization by Curlander, John C. et al.
I 11111 11111111 111 11111 11111 111 11111 11111 11111 11111 11111 111111 11 11111 1111 
US005477221A 
United States Patent [191 1111 Patent Number: 5,477,221 
Chang et al. [MI Date of Patent: Dec. 19,1995 
[54] PIPELINE SYNTHETIC APERTURE RADAR 
DATA COMPRESSION UTILIZING 
ARCHITECTURE FOR VECTOR 
QUANTIZATION 
SYSTOLIC BINAXY TREE-SEARCHED 
[75] Inventors: Chi-Yung Chang, Torrance; Wai-Chi 
Fang, San Marino; John C. Curlander, 
Pasadena, all of Calif. 
[73] Assignee: The United States of America as 
represented by the Administrator of 
the National Aeronautics and Space 
Administration, Washington, D.C. 
[21] Appl. No.: 44,092 
[22] Filed: Mar. 22, 1993 
Related U.S. Application Data 
[63] continuation of Ser. No. 550,775, Jul. 10,1990, abandoned. 
[51] Int. C1.6 ...................................................... H03M 7/00 
[52] U.S. C1. ................................................. 341/51; 341/79 
[58] Field of Search ................................ 341/51, 79, 106, 
341/161, 200; 358/21 R, 340/735 
1561 References Cited 
U.S. PATENT OOCUMENTS 
4,670,851 6/1982 Murakami et al. ..................... 3641518 
5,010,401 4/1991 Murakami et al. ..................... 358/136 
OTHER PUBLICATIONS 
C. Y. Chang, et al., “Data Compression of Synthetic Aper- 
ture Radar Imagery,” JPL D-5210, 1988. 
P. R. Cappello, et al., “A Systolic Vector Quantization 
Processor for Real-Time Speech Coding,” Proc. of ICASSP, 
R. Dianysian, et al., “AVLSI Chip Set for Real-Time Vector 
Quantization of Image Sequences,” Proc. of Int. Symp. on 
Circuits and Systems, May 1987. 
C. Y. Chang, et al., “Systolic Array Processing of the Viterbi 
Algorithm,” E E E  Trans. on Information Theory, vol. 35, 
No. 1, pp. 76-86, Jan. 1989. 
pp. 2143-2146, Tokyo, 1986. 
Primary Examiner-Brian K. Young 
Attorney, Agent, o r  Firm-John H. Kusmiss; Thomas H. 
Jones; Guy M. Miller 
[571 ABSTRACT 
A system for data compression utilizing systolic array archi- 
tecture for Vector Quantization (VQ) is disclosed for both 
full-searched and tree-searched. For a tree-searched VQ, the 
special case of a Binary Tree-Search VQ (BTSVQ is 
disclosed with identical Processing Elements (F‘E) in the 
array for both a Raw-Codebook VQ (RCVQ) and a Differ- 
ence-Codebook VQ (DCVQ) algorithm. A fault tolerant 
system is disclosed which allows a PE that has developed a 
fault to be bypassed in the array and replaced by a spare at 
the end of the array, with codebook memory assignment 
shifted one PE past the faulty PE of the array. 
4 Claims, 12 Drawing Sheets 
J 
SAR 
Processor 
I V Q  Compressor 
51 I VQ Codebook Memory Bank 
I ’  I ..*?.!. PE Array 
t f 23 
t 
Downl ink  
Pack e t izer 
4 
v I 
Eos Control and Data System (CDS) 
https://ntrs.nasa.gov/search.jsp?R=19960016849 2020-06-17T23:57:25+00:00Z
U.S. Patent Dee. 19,1995 Sheet 1 of 12 5,477,221 
n 
Y 
U <-- 
n 
2d 
U x 
n 
Y 
U <- 
c 
I 
"0 "-. . '" Z 
n 
Y 
U 
C X  
US.  Patent 
n 
Y 
U 
X 
Dec. 19,1995 Sheet 2 of 12 
/ 
0 - -  
/ .  
I 
N 
Z 
n 
/" 
0 > 
Y 
3 2  
\ 0 
d 9  E 477,221 
c 
cu5  
wa: 
- 0  
U 
U.S. Patent 
U” 
h 
(-4 
k 
U” 
v 
e . .  
0 . .  
Dec. 19,1995 Sheet 3 of 12 
rr) 
LL 
n 
0 
W 
0 
0 . 
e 0 
. 
0 . e e 0 
n c 
U” 
n 
(-4 
W 
U” 
h 
W 
0
U” 
5,477,221 
I 
0 m 
0 
Q 
% O  
2 
\ 
n 
t-c 
U x 
4 
\ 
n c 
E 
W 
X 
U.S. Patent Dec. 19, 1995 Sheet 4 of 12 5,477,221 
U.S. Patent Dec. 19, 1995 Sheet 5 of 12 5,477,22 1 
c 
A 
VI 
v1 
aJ 
r 
C 
Ln 
w 
U.S. Patent Dec. 19,1995 Sheet 6 of 12 5,477,221 
D I (7:O) 
_t_ 8 
c"", 1 
( From Sub-codebook Memory Module) 
DC(7:O) , 8  DC( 15:8) ' \ 8 
e3 (3 ADDER 
R ( 8 : O )  ?!2 I MUX I 
I 
& 2o 
ADDER 
R( 19:O) 
p, 
v w 
COMPARATOR 
I 
DO(7:O) 
8 * 
1) Hn 
(To Next Memory Module ) 
FIG .6 
U.S. Patent Dec. 19, 1995 Sheet 7 of 12 
J 
E c? w > s 
5,477,221 
I I '  I 
4 
X 
-. 
.. 
I I 
U.S. Patent Dec. 19,1995 Sheet 8 of 12 5,477,221 
-l 
0 
0 
0 
3 
c. 
N 
+e: ;- 
Q 
X 
--. 
n 
c 
0 
VI - 
c. .-2 a
U 
X 
\ 
CI 
FI 
VI 
U.S. Patent Dec. 19, 1995 Sheet 9 of 12 
L 
aJ 
N 
E 
m 
.-. 
Y 
5 
s 
L 
0 
u Y 
0 
u Y 
s s  A . .  
I 
0 
0 
a t  
I I 
i- 
m 
N c 
e, 
c) 
5,477,221 
U.S. Patent Dec. 19,1995 Sheet 10 of 12 
I 
I 
I 
I 
I 
I 
C f  
E ;  
I 
I 
I 
I 
I 
I 
I T 
1 
I 
I 
I 
I 
I 
I 
I 
I 
I 
I 
I 
I 
I 
I 
I 
N I  
2 1  
I 
I 
I 
I 
I 
I 
I 
1 
I -4 
U 
I 0) 
I LL 
I 
I - 
1 
I 
I 
I 
I 
1 
I 
I csi 
5,477,221 
I 
I 
I 
I ' 
I 
I 
I 
I 
I 
I 
U.S. Patent Dee. 
VCLK 
lDP(8:O) 
AP(3:O) 
AB(8:O) 
WRCD" 
PCLK 1 
DI(7:O) 
D( 19:O) 
WRCSW 
-9, 1995 Sheet 11 of 12 5,477,221 
ADD DATAIN 
DIFFERENCE SUBCODEBOOK MEMORY 
512 x 9; 32x20 (for subcodcbook at lcvcl 6 )  - WR 
DATA OUT 
I 
CD(8:O) 
CSD(19:O) 
c 
b a PIPELINE BUFFER I I 
G-7 
MDR(9: 0) 
33 COMPLEMENTOR 
C0MPLEMENM)R 
CSD(19:O) 
COMPARATOR 
ID(9:O) =+ 
FROM 
EXT 
MEM 
W C D (  8:O) - 
- 
XLK2 
7
FIG. IO 
U.S. Patent 
a 
w 
0 
0 
V z 
W 
0 > 
t- z 
w 
-I 
0 
I- 
a a 
5 
a 3 
Lr. 
Dec. 19,1995 Sheet 12 of 12 
0 
e 
e 
! 
m 
a 
I 
# 
H 
t 
(h3 m I 
v) 
E 
5,477,221 
5,477,221 
1 
PIPELINE SYNTHETIC APERTURE RADAR 
DATA COMPRESSION UTILIZING 
ARCHITECTURE FOR VECTOR 
QUANTIZATION 
SYSTOLIC BINARY TREE-SEARCHED 
ORIGIN O F  THE INVENTION 
The invention described herein was made in the perfor- 
mance of work under a NASA contract, and is subject to the 
provisions of Public Law 96-517 (35 USC 202) in which the 
Contractor has elected not to retain title. This is a continu- 
ation of application Ser. No. 07/550,775 filed Jul. 10, 1990, 
now abandoned. 
TECHNICAL, FIELD 
The invention relates to a systolic array architecture for a 
Vector Quantizer ( V Q  for real-time compression of data to 
reduce the data communication and/or archive costs, and 
particularly to a tree-searched VQ. 
BACKGROUND ART 
Efficient data compression to reduce the data volume 
significantly decreases both data communication and 
archive costs. Among existing data compression algorithms, 
Vector Quantization (VQ) has been demonstrated to be an 
effective method capable of producing good reconstructed 
data quality at high compression ratios. The primary advan- 
tage of the VQ algorithm, as compared to other high 
compression ratio algorithms such as the adaptive transform 
coding algorithm, is its extremely simple decoding proce- 
dure, which makes it a great potential technique for the 
single-encoder, multiple-decoder data compression systems. 
The VQ algorithm has been selected as the data compres- 
sion algorithm to be used for rapid electronic transfer of 
browse image data from an on-line archive system to end 
users of the Alaska S A R  Facility (ASF) and the Shuttle 
Imaging Radar C (SIR-C) ground data systems. For this 
on-line archive application, VQ is required to reduce the 
volume of browse image data by a factor of 15 to 1 so that 
the data can be rapidly transferred through the Space Physics 
Analysis Network (SPAN) having a 9600 bits per second 
data rate, and be accurately reconstructed at the sites of 
scientific users. 
Another application of the VQ algorithm is the real-time 
downlink of the Earth Observing System (EOS) on-board 
processor data to the ground data users. For this data 
downlink application, VQ is required to reduce the volume 
of image data produced by the on-board processor by a 
factor of 7 to 1 so that the data can be transferred at real-time 
through the direct downlink channel limited at 1 Megabits 
per second data rate. These flight projects are currently 
undertaken by the National Aeronautics and Space Admin- 
istration (NASA) for imaging and monitoring of global 
environmental changes. 
Aside from these space applications, VQ can also be 
applied to a broad area in commercial industry for data 
communication and archival applications, such as digital 
speech coding over telephone lines, High Definition TV 
(HDTV) video image coding and medical image coding. 
Vector quantization is a generalization of scalar quanti- 
zation. In vector quantization, the input data is divided into 
many small data blocks (Le., data vectors). The quantization 
levels (i.e., codevectors) are vectors of the same dimension 
as the input data vectors. A general functional block diagram 
for vector quantization is shown in FIG. 1. A codebook 10 
5 
10 
15 
20 
25 
30 
35 
40 
45 
50 
55 
60 
65 
2 
comprised of codevectors C,, C,, . . . , CN-,, is used at the 
transmit end of a communication channel 11 for data encod- 
ing and a duplicate codebook 10 is used at the receive end 
for data decoding. An encoding functional block 12 carries 
out the algorithm indicated by 
where: xCk1 represents the input data vector at time k; Ci is 
the codevector; D(xIkl, Ci) is thepistortion function; N the 
total number of codevectors; and irk] the optimal codevector 
index. The procedure defined by that equation is to select the 
stored codevector which yields the minimum distortion 
between an input data vector xLkl andthe stored codevectors 
C,, C, . . . , CN-,. The optimal index iIkl transmitted through 
the channel 11 is used at the receive end for the dec?ding 
function in block 13 carried out by using the index to 
look up the codevector C;WI in the codebook 10' that is then 
used as the reconstructed data vector %['I, which closely 
approximates the original data vector xIkl. The decoding 
procedure can be expressed as 
%[kl-d!;[k] (2) 
which is a table look-up procedure. Data compression is 
achieved since fewer bits are needed to represent the code- 
vector indices than the input data vectors. 
The codebook is generated by training a subset of the 
source data. The performance of the codebook is highly 
dependent on the similarity between the training data and the 
coded data. It then follows that the encoding procedure need 
only involve computing the distortion between each input 
data vector and all of the stored codevectors to select the best 
match. This algorithm is known as the full-searched VQ 
algorithm. 
The major drawback of the full-searched VQ algorithm is 
the high complexity involved in drawing up (training) the 
codebook and then data encoding, which poses a great 
challenge for real-time application. To reduce the encoding 
complexity, the tree-searched VQ algorithm is employed 
such that the complexity only grows linearly rather than 
exponentially as the codebook size increases. For the tree- 
searched VQ, the codebook is divided into several tree 
levels, as illustrated in FIG. 2 for a 2-level tree-structured 
codebook. In the encoding process, the input data XI'] is first 
compared with the first level codebook C,,C,, . . . 
Based on the selected codevector, the input data xIkl vector 
is then compared with the codevectors of the corresponding 
second level subcodebook Co,,,C0,, . . . CN1-1,N2-l. This 
encoding procedure is repeated until the input data vector is 
compared with the last level subcodebook. The best matched 
codevector at the last level subcodebook is then used to 
represent this input data vector. 
STATEMENT OF THE INVENTION 
An objective of this invention for real-time data compres- 
sion is to employ a systolic process in the VQ encoding 
procedure by taking advantage of the regular data flow 
pattern inherent in the VQ algorithm, particularly with a 
tree-searched codebook. By a combination of tree-searched 
VQ and systolic processing, a high throughput data com- 
pressor can be realized at a low hardware cost to meet the 
real-time rate requirement. This is the main theme of this 
invention. Thus, the primary objective of this invention is to 
provide a data compression system that can achieve a 
real-time encoding rate with small hardware cost utilizing 
(3) 
5,477,22 1 
3 4 
systolic array architecture for a tree-searched VQ algorithm. 
The systolic array consists of a network of identical 
Processing Elements (PE) that process and 
DETAILED DESCRIPTION OF THE 
INVENTION 
As noted hereinbefore, Vector Quantization (VQ) is 
P a s  data among themselves. It 
such as modularity, regular data flow, simp1e 
structure, localized communication, simp1e global 
and ParalleuPiPeline processing functions. The 
array is an effective architecture for imp1ementation Of 
design Principles essentially a generalization of scalar quantization. For input 
image data, the stream of input pixels is divided into Vectors 
(small blocks of pixels, e.g., 4x4 pixel blocks) and for a 
full-searched VQ, each input data vector is compared with 
every vector stored in a codebook, The index of the code- 
matrix type computation. 
both full-searched and tree-searched VQ. Briefly, the encod- 
io book vector of the smallest distortion is chosen as the 
This invention applies the systolic array architecture to encoded quantization vector to be transmitted. To reduce the 
encoding complexity, the tree-searched VQ technique is 
ing procedure of a ful~-SearChed VQ Can be f0mdated as a employed. This technique divides the codebook into levels 
matrix-vector computation in a general form, Where the of subcodebooks of a tree structure as illustrated in the 
multiply operator represents the Scalar distortion COmPuta- 15 background art section. The input data vector is successively 
tion and the add operator represents the of compared with the stored codevectors in the subcodebook 
weighted scalar distortions, while the encoding procedure of levels, i.e., 
a tree-searched VQ can be formulated as a series of matrix- 
vector computations with proper access to codevectors in the 
subcode-books. Examples are specifically given for a Binary 2o 
Tree-Searched VQ (BTSVQ) of both a raw codebook and a 
difference codebook referred to hereinafter as RCVQ and 
DCVQ, respectively. $I = min-' ~ ( x [ ~ 1 , ~ i ~ [ k ] i ~ )  
fault tolerant systolic VQ encoder by including a spare 
Processing Element (PE) in a systolic array of PES and a 
means for detection and replacement of a faulty PE with the 
p 1  - min-' D ( x [ I ' I , c ~ J  
0 s il SN1-1 
A secondary objective of this invention is to provide a 25 0 s iz 5 N 2 -  1 
spare PE to enhance the system reliability. $1 = m i d  D ( x [ ~ ~ , C i l [ k l i ~ k l  . . . tL1[kliL) 
[k] - .[kl lk l  t'kl 
The novel features that are considered characteristic of 30 0 5 i~ 5 N L -  1 
this invention are set forth with particularity in the appended 
claims. The invention will best be understood from the i - - t l  tz . . . L  
following description when read in connection with the 
accompanying drawings. 
35 where xLkl is the input data vector sequence, k represents the 
time index, and the codevector notation is: Cil for level 1; 
Cil4 for level 2; and so forth with Cili2. . . iL for level L. The 
distortion function is D(xrk1,C,. . ) and the output coded data 
sequence is ;["I. The number of bits per input pixel is K and 
40 the input vector dimension is m pixels. Decoding is still a 
table look-up procedure, 
BRIEF DESCFWTION OF THE DRAWINGS 
FIG. 1 is a generalized functional block diagram of the 
FIG. 2 is a diagram of the prior-art encoding procedure of 
FIG. 3 illustrates a block diagram of a systolic full-search 
FIG. 4 illustrates a systolic architecture for a Binary 
FIG. 5 illustrates major functional blocks of a systolic 
prior-art vector quantization (VQ) algorithm. 
the 2-level tree-searched vector quantization algorithm. P"1=Ci[k]=C;,[k];2 [k] . . . ;L[k] (4) 
The compression ratio is Km/n for a fixed codebook 
memory size is 
vector quantizer. 
Tree-Search Vector Quantizater (BTSVQ). 
binary tree-searched VQ encoder as applied to EOS on- 5o where nl represents the subcodevector bit length at level i, 
board S A R  processor. 1 S i S L  and NL=2nL represents the number of codevectors. 
The encoding cOmplexity is 
2"'+2"4. . . +2"L 
45 
scheme. The 
(2"1+2"*+"Z+ . . . +2"'+ ' ' . +"L)mK bits, 
FIG. 6 illustrates a functional block diagram of a BTSVQ 
FIGS. 7a and 7b together illustrate a detailed functional 
Processing Element (PE) in the system of FIG. 5 for RCVQ. 
design of the memory bank shown in FIG. 5. 
systolic vector quantizer in which each vector quantization 
processing element has its own codebook memory. 
processing elements in the system of FIG. 8. 
processing elements of FIG. 9. 
55 operations per pixel. Compression ratios are more easily 
variation in n (codebook bit-length) significantly affects the 
codebook size and the encoding complexity. 
The Binary Tree-Searched VQ (BTSVQ) is a special case 
tree-levels is equal to the codebook bit length (n). The 
encoding of the BTSVQ can be expressed as 
FIG. 8 illustrates a major functional,block diagram for a controlled by adjusting m (vector dimension) since the 
FIG' 'lustrate the distortion computing data Path Of the 60 of tree-searched VQ. For the BTSVQ, the number L of 
FIG. 10 illustrates a preferred implementation for the 
FIG. 11 illustrates fault tolerance augmentation of a 
element and dynamic reconfiguration switches for replacing 
a processing element when it is found to have a fault. 
systolic vector quantization array using a spare processing 65 iy' = min-lD(x[kl,cil), il = 0, I (5)  
5,477,221 
5 6 
For this class of distortion measure, the encoding proce- 
dure of the full-searched VQ shown in Equation (1) can be 
expressed in a general matrix-vector multiplication form, 
where the multiply operator represents the evaluation of 
5 scalar distortion and the add operator is the summation of the 
weighted scalar distortions. Therefore, Equation (9) can be 
systolic processed since matrix type computations are well 
suited for systolic processing. 
A systolic architecture for the full-searched VQ may thus 
a raw-codebook BTSVQy the lo be an array of processors, 0, 1, . . . ,N-1 and codebooks 0,1, 
. . . ,N-l, each codebook having a stored codevector 
comprised of m components C,(O), Ci(l), . . . ,C,(m-l), as 
shown in FIG. 3. The distortion parameter, d(i), is associated 
with processor i where the distortion is computed, for 
D(x[k],Co) = ( x [ q j ) 2  + C0o')Z) - 2 x[']o')Co(i> OSiSN-1. The parameter d(i) accumulates the intermediate 
result as the codevector component C,(i) moves downward 
and the input data x(j) moves to the right synchronously. 
After m clock cycles, d(i) will consecutively contain the 
20 distortion between the input data vector and the irh Code- 
vector. TO perform Equation @), two variables, I and D, are 
required to record the index and distortion of the codevector 
of the current minimum distortion. The variable D is ini- 
tialized to be a large number. Both I and D enter processor 
processor per clock cycle. At processor i, D is compared 
with d(i). If d(i)<D, then I=i and D=d(i). As they flow out of 
processor N-1, I will contain the codevector index of the 
minimum distortion, representing the coded data. 
For continuous data encoding, the next data vector with its 
own pair of I and D follows right after the current data vector ' 
so that the data are continuously pumped into the array. This 
can be achieved by cycling the codevector components C,(j) 
into processor i as the input data flows into the array. Each 
35 d(i) is reset after the vector distortion is determined. 
For this systolic architecture having N processors and N 
codevectors, and each codevector has m components, the 
encoding speed is increased by a factor of N over a single 
processor architecture. The pipeline latency is N+m clock 
(8) 40 cycles. The throughput rate is constant at I pixelklock for 
any vector dimension and code book size. Since typically N 
-continued 
$1 = miIi-1D(x[kl,ci 1 2  i ), i2 = 0, 1 
trl =min-'~(x[kI,cil~z...~~- , ~ ~ ,  in =o, 1 
f [&I - f , t z  [kl .MI . . ' "  i[kl
For an RCVQy 
distortion computation between the input vector xLel and the 
codevectors at the Same binary tree level ('0 and '1) is, 
(6) 15 
J=o J=o 
D(X[k],Cl) = ( x ' q j ) 2  + Clo'f) - 2 X'kJo')C,(i> 
i;o i;o 
The codebook memory size is (2"+-2)& bits. 
encoding complexity is 2n operations per pixel. 
F~~ a DCVQ, namely a difference-codebook BTSVQ, the 
distortion computation between the input vector X[kl and the 
simplified as follows 
[D(x,Co) - D(~,Ci)l/2 = 
codevectors at the Same binary tree level (C, and C,) is 25 0 when d(O) is determined. They move down the m a y  one 
(7) 
30 
in-1 in-1 c (coo')2 - Clo')2)/2 - c P o ' )  [Coo') - Clo')] = 
j=O i;o 
in-1 
9 
A -Zxy1(j)6(j) 
Instead of saving of C,(i) and C,(j), the terms, 
A = (Coo')' - Cl(i)')/2 
9 
and 8(j)=Co(i)-C,(i) are stored in the subcodebook. The 
codebook memory size is (2"-1) [m(K+1)+(2K+log m)] 
bits. The encoding complexity is n operations per pixel. 
The DCVQ is an improved version of the RCVQ. For the 
RCVQ, the encoding and hardware complexity is reduced 
by half of that of the RCVQ. T h i s  is a unique characteristic 
for a BTSVQ. 
Systolic Architecture for the Full-Searched VQ 
For most distortion measures, such as the weighted mean 
square error, the vector distortion can be shown as the 
weighted sum of the scalar distortion, i.e., 
(9) 
for OSSN-1 and OSjSm-1, where x'j) represents the jth 
component of the input data vector, C,(j) the jrh component 
of the irh codevector, w(i) the weighting factor in the 
distortion measure, and d(i) the distortion between x and C, 
The index of the codevector of the minimum distortion 
represents the coded data of the input data vector, Le., 
is chosen to be large to attain good reproduced image 
quality, a large number of processors are required. There- 
fore, in accordance with the present invention, by combi- 
45 nation of tree-searched VQ and systolic processing, a high 
throughput VQ encoder can be realized with minimal hard- 
ware. 
Systolic Architecture for Tree-Searched VQ 
Equation (3) shows that the tree-searched VQ encoder is 
50 in effect a series of the full-searched VQ encoders. The key 
is to correctly address the next level subcodebook. This can 
be realized by tagging the index of the current tree level 1 to 
the indices of the preyious tree levels 1,2, . . . 1-1. The 
combined indices are then used to address the next level 
A systolic architectttre for the tree-searched VQ is essen- 
tially a concatenation of L systolic arrays of the full- 
searched VQ, where L is the number of tree levels. Each 
stage 1 corresponds to one tree level 1. The codevectors of 
60 each subcodebook are arranged as follows. Codevector 
components Cfl . . . ,[G) are allocated to processor i, of the lth 
stage array. There are N, . . . N,,m codevector components 
in each processor of the lrh stage array. During the VQ 
encoding, the codevector components are addressed by the 
65 combined indices of the previous stages, i, . . . &-,. For this 
pipeline architecture the lrh stage contains N, processors, 
which in total is 
55 subcodebook 1+1. 
5,477,22 1 
7 
L 
1=1 
NI 
processors. The pipeline latency is 
clock cycles. The system throughput rate is 1 pixelklock, 
constant for any tree-structured codebook. 
Systolic Architecture for Binary Tree-Search Raw Code- 
book VQ 
A systolic architecture for the raw codebook binary tree- 
searched VQ (RCVQ) defined by Equation (6) is shown in 
FIG. 4 where the blocks d,(O) and d,(l) are distortion 
computation elements for implementing Equation (6); CP(0) 
and CP(1) are elements for comparison of the distortion; and 
buffer elements 1 delay the input data sufficiently to maintain 
synchronization of the data flow through the pipeline of 
distortion computation elements with the concatenated indi- 
ces used to address the next stage 1+1 codebooks. The 
preferred organization of each stage will be described more 
fully in the next sections. 
The input data sequence continuously flows into the array. 
Note that at each stage the data vector is compared with two 
codevectors in memory. After the index of the current tree 
stage (level) is obtained, it is tagged to the indices of the 
previous tree stages (levels) to address the next stage (level) 
subcodebook. The index is attained at a rate of one bit per 
stage. At the end of the array, the concatenated indices, n=L 
bits in length, are formed to represent the coded data. 
Since n=L for the binary tree-searched VQ, the overall 
system requires 2n processors. The pipeline latency equals 
n(2+m) clock cycles. The input data rate is 1 pixel per clock 
cycle, and the output data rate is n bits per m clock cycles. 
Therefore, the output data rate is effectively reduced by a 
factor of Km/n, the compression ratio. This systolic archi- 
tecture of FIG. 4 only requires a small number of processors 
compared to the full-searched VQ scheme. It has the advan- 
tages of modularity, regular data flow, simple interconnec- 
tion, localized communication, simple global control, and 
parallel/pipelined processing such that it is well suited for 
VLSI implementation. 
Preferred Design of Systolic Binary Tree-Searched Raw 
Codebook VQ 
An example of a preferred design RCVQ which lends 
itself to VLSI implementation for EOS on-board S A R  
applications is detailed in this section for a 10-bit codebook 
of a 4x4 pixel vector dimension. This results in 12.8:l 
maximum compression ratio. Limited flexibility in compres- 
sion ratio can be realized by varying the vector dimension. 
The mean square error criterion is chosen as the distortion 
measure. FIG. 5 illustrates the major functional blocks of a 
systolic binary tree-searched VQ encoder which are the 
processing element (PE) array 20, the VQ codebook 
memory banks 21 and an array controller 22, all of which are 
under synchronized control of an EOS Control and Data 
System (CDS) 23 as are a S A R  processor 24 which presents 
the serial pixels in digital form and a downlink packetizer 25 
which forms packets of VQ data for transmission to a ground 
station. 
5 
10 
15 
20 
25 
30 
35 
40 
45 
50 
55 
60 
65 
8 
Detailed RCPE Design 
The PE array 20 performs the distortion computation of 
the VQ algorithm. For a VQ encoder with an n-bit codebook, 
this can be realized by n identical PES. FIG. 6 shows a 
functional block diagram of a PE for a RCVQ. It is designed 
to compute the mean square error distortion between an 
input data vector and each codevector pair. 
The distortion computing of the raw codebook processing 
element (RCPE) design is primarily two mean square error 
operations. During the VQ encoding, the codevector pair 
components afe-addressed by the combined indices of the 
previous PES i, i, . . . izp1. An accumulator accumulates the 
intermediate result as the codevector pair component C, and 
C, moves downward and the input data xu) moves to the 
right synchronously. After m clock cycles, the accumulator 
will consecutively contain the mean square errors d, and d, 
between the input data vector x and the selected codevector 
The index generator compares the distortionmeasurement 
d, and &. If d,L& then i# else ipl. Index i, is tagged to 
the indices of the previous tree levels to correctly address the 
next level subcodebook. At the end of the array, the con- 
catenated indices, n bits in length, are formed to represent 
the coded data. 
The RCPEs are identical, designed to fit into a single chip 
using VLSI space-qualifiable 1.25 pm CMOS technology. 
Assessment based on a detailed logic diagram and VLSI 
layout of the RCPE shows that the gate count is about 3,000 
and the pin count about 37, which is well within the 
capability of present VLSI technology. 
A detailed functional design of an RCPE is shown in FIG. 
6.  The pin name and definition of the RCPE and associated 
Memory Bank shown in FIGS. 7a and 7b is summarized in 
the following table: 
pairs. 
Signal Type Description 
MEMORY BANK 
CLK Input 
cs (101) Input 
HA-EN Input 
H L L D  Input 
A ( E O )  Input 
WA Input 
Riw Input 
OE Input 
D 115:O) h u t  
System clock 
To enable the pixel address generator 
To load the hierarchical vector address 
To enable the memory module #1 to # lo  
System address bus 
To select either system address or hier- 
archical encoding address 
To select either memory read or memory 
write 
Tri-state output control 
Svstem data bus 
Ddn (15;O) 0;tput lk-bit output port of subcodebook #n 
PROCESSING ELEMENT 
DC (15:O) Input Codevector pairs from subcodebook module 
CLK Input System clock (at pixel rate) 
DI (7:O) Input 8-bit input image data 
DO (7:O) Output 8-bit 16-stage pipelined image data 
Hn Output Index of vector generated at PE#n 
Detailed Memory Bank Design 
The memory bank is composed of subcodebook memory 
modules, each storing a VQ subcodebook. FIGS. 7a and 7b 
show a detailed functional design of the memory bank 21 in 
FIG. 5. For the binary tree-searched VQ, the n-bit codebook 
is divided into n(=L for binary tree-searched VQ) hierarchi- 
cal levels. The codevectors in each level 1 are stored in their 
corresponding memory module 1. The size of the memory 
module 1 is 2'1nK(=2"'~) bits. The total size of the memory 
bank is (2n+"-2)mK(=2"+8-28) bits. Although the modules 
of the memory bank differ in size, they assume a regular 
5,477,22 1 
9 10 
structure in terms of memory cell design. To enable the 
programmability of the codebook, the memory bank can be 
accessed in both read and write modes by the host system 23 
of FIG. 5 via the array controller 22 during the initialization. 
During VQ encoding operation, each memory module can 5 
only be accessed to read or write by its associated RCPE. 
10-bit codebook is 218 bits. 
systolic Architecture for Binary Tree-Searched Difference 
Codebook VQ 
for the difference-codebook BTSVQ. The input data vector 
sequence continuously flows into the array. For difference- 
input data vectors and the difference codevectors is corn- 15 
puted and compared with the 21h order difference codewords. 
After the index of the current tree level is obtained, it is 
tagged to the indices of the previous tree levels to address 
the next level subcodebook. The index is attained at a rate of 
one bit per stage. At the end of the array, the concatenated 20 
indices of n-bit length are formed and represent the coded 
data of the corresponding input data vector. 
the host System Via the on-board sAR processor to set up p-o 
system to do house keeping. It provides the interface 
To attain the light-weight, small-volume, and low-power 
requirements, VLSI technology is preferred for implemen- 
tation of the DCPE of FIG. 9 as shown in FIG. 10. The 
building blocks include a pipeline buffer 30, one ID register 
31, multiplexers 32, 33 and 349 Static RAM m a y  35, 
complement or 36, multiplier array 37, carry save adder 38, 
includes a 512x9 RAM and an 32x20 RAM which are used 
to store the difference subcodebook up to level 6. For levels 
from 7 to 10, an additional external subcodebook memory is 
lo required for each level. An external memory interface is 
FIGS' and show the architecture Of the array represented by an input EXTCD(8I.0) from external memory 
S3 enabled by an input EXTCDEN for 
interface is built as part of each DCPE to 
The total size in terms of the primitive memory cell for a and comparator 39. m e  on-chi~ static RAM array 35 
to a 
levels 7-10. 
vLsI 
subcode- 
book memory 35 can be read out of and written into by the 
host system via the controller 20 (FIG. 8) during the setup 
mode. While in the encoding mode, each subcodebook 
can only be read out of and written into by its 
associated PE. In the setup mode, the first-order codevector 
differences 6 are stored into the subcodebook memory 35. 
The array 22 parmeters from Meanwhile, the second-order codevector differences A are 
entered and stored in a threshold register 40 of each PE. In 
from the on-board SAR processor 24 via the array controller 
codebook BTsVQ at each Stage, the inner Product between support a lo-level systolic BTsVQ encoder with a common 
for each DCPE. 
To enable the programmability, the 
the BTsVQ encoder and Provides Status data for the '0s' 25 the encoding mode, the input vectors, ~1(7:0), are received 
timing to uploadldownload the data among the VQPEs, SAR 
processor 24 and downlink formatter 25. It also generates 
timing and signals to 'perate the 'QPEs 22. The 
array device and several data buffers. Due to the 
localized datdcontrol flow of systolic array processors, the 
array controller logic is simple. 
22. 
an inner product between the input 
vectors and the codevector differences. The inner product is 
codevector differences A stored in the threshold register 40 
at the rising edge of a vector clock v ~ L ~ .  A one-bit index 
bit is generated at level 1 and concatenated with index bits of 
the previous PES for lower levels to address the next level 
thus formed represent the coded data for the input data 
vector x. The pin name and definition of DCPE is suIIMa- 
ized in the following table: 
The pE 
array cOntro1ler is implemented with a programmable logic 30 stored in a register 41 and compared with the second-order 
In this difference codebook BTSVQ1 each 
corresponds to One Of several binary tree-1eve1s, such as ten 35 1+1 subcodebook. The concatenated index bits ofthe last PE 
through lo in the to be described. The 
major functional blocks of each VQPE1,2. . . n of a BTSVQ 
shown in FIG. 8 are a subcodebook memory 26, distortion 
computation data path 27 and index generator 32. 
and converted into n difference subcodebooks. The first- 
order and second-order differences of each codevector pair 
For the DCPE of a BTSVQ, an n-bit codebook is divided 40 
Type Description 
Input Vector clock 
signal 
in level 1 are stored in the subcodebook as shown in FIG. 9. 
The size of difference subcodebook memory of DCPE at pCLK1 Input Pixel clock @hase 1) 
level 1 is 2"'[m(K+1)+(2K+log m)] bits. 45 PCLKZ Input Pixel clock (phase 2) 
Referring to FIG. 9, the distortion computing datapath 27 
of the DCPE design is primarily an inner product operator 
which is much simpler than the distortion calculator of the 
RCPE. During the VQ encoding, the difference-codevector WRCD* Input Write enable of subcodebook (active low) 
components y e  addressed by the combined indices of the 50 D1 (7:0) 
vcLK 
AB (8:O) Input 9-bit system address bus for subcodebook 
memory 
memory 
(19:o) Input 20-bit system data bus for subcodebook 
Input 8-bit input image data 
DO (7:O) Output 8-bit 16-stage pipelined image data 
WRCSD* previous PES, i,, i, . . . iZ-,. An accumulator accumulates the 
intermediate result as the difference-codevector component E ~ C D  ( ~ ~ 0 )  Input 9-bit codeword from the external subcode- 
6(i) moves downward and the input data xu) moves to the 
right synchronously. After m clock cycles, the accumulator EXTCDEN Input TO enable to accept 
will consecutively contain the inner product A between the 55 Ap (3,0) 
input data vector x and the selected difference codevector. IDP ( 8 : ~  
Input Write enable of threshold register 
book memory 
EXTCD (8:O) 
Address of pixel elements of vectors 
9.bit concatenated indices &om previous 
PES 
Input 
Input 
The index generator compares the 21h order difference 
codeyord A wjth the distortion measurement A .  If AZA, ID ( g o )  Output 10-bit concatenated inhces 
then i,l else i$. Index i, is tagged to the indices of the 
previous tree levels to correctly address the next level 60 Fault Tolerance Design 
subcodebook. At the end of the array, the concatenated For a space mission, it is reasonable to assume a 5 to 10 
indices, n bits in length, are formed to represent the coded year unmaintained mission life with a processor reliability 
data. The comparator-based index generator makes it easy to goal well above 0.95. Afault tolerant architecture is required 
perform error detection for PE. However, the subtracter- to achieve these goals. By combination of architectural fault 
based index generator has simpler hardware. 65 tolerance and inherent error detection capability, a highly 
Preferred Design of Systolic Binary Tree-Searched Differ- reliable VQ encoder can be attained, such as by a pro- 
ence Codebook VQ grammed diagnostic routine initiated by the control and data 
5,477,221 
11 12 
system which supervises the SAR processor, VQ compres- 
sor and downlink packetizer. When a fault is detected in any 
one PE, a “fault” signal is generated and associated with the 
PE suffering a fault. 
As shown in FIG. 11, the linear systolic array ofthe VQ 
encoder is augmented with a spare Processing element SPE 
at the end of the array and dynamic reconfiguration switches 
(RS). Two switch designs, type RS-A and type B, are 
presented to support the fault tolerance reconfiguration. If 
there is a permanent fault in any active pE, the faulted PE 
It has been shown that error correction using arithmetic code 
is also cost effective. The encoding introduces redundant bits 
in the number representation. A proportional hardware 
increase takes place in register array and data path. The 
estimated hardware overhead is from 20% to 40% which 
should be able to fit in the PE of available die size 300 
The reliability improvement can be addressed as follows: 
If each PE has a reliability of R, then the reliability of 10 PES 
5 
mils. 
will be detected and bypassed by a type RS-B switch at its 
oq,ut.  Meanwhile the spare Processing element SPE at the 
is R’o. For the reconfipable with One ’Pare the 
reliability becomes R”+11 RIO ( 1 - N  For e x m P k  if 
end of the array will be activated by type RS-A switches for R s . 9 5 ,  the reliability of nonredundant PE array is 0.60 
all P E ~  downstream in the array. m e  spare Processing while the reliability of redundant array is 0.90. This repre- 
element SPE is bypassed by a type RS-B switch at its output sents a 50% increase in reliability. 
until called upon to serve. It is at that time that the VQ 15 Conclusion 
codebooks of the PES are all switched starting with the PE Although particular embodiments of the invention have 
having a fault and thus shifting each PE code book to the been described and illustrated herein, it is recognized that 
next PE of the array in a direction from the input end to the modifications and variations may readily occur to those 
output end of the PE array. The reconfiguration switches are skilled in the art. Consequently, it is intended that the claims 
controlled by a “fault” signal stored in an array register by 20 be interpreted to cover such modifications and variations. 
the diagnostic subroutine system which conducts the tests 
for detection of a faulty PE during the set-up time before 
encoding SAR data for transmissions. 
We claim: 
1. In a systolic-array image processing system, a full- 
searched vector quantizer for data compression comprising In detecting a a single (such as 
within a 
an array of N processors, with N distortion parameters, d(i), 
or adder) fault model may be used where it is 25 one for each processor, and N codevectors stored in a 
codebook, where each stored codevector Ci comprises m assumed that at most One suffer a given period of time which will be reasonably short com- components Ci(0), . . . , Ci(m-l), said array of N processors pared with the mean time between failures. Since effective 
Hamming codes, exist for communication lines and memo- generate said distortion parameters as a weighted sum of 
ees, failures in these parts can be readily detected and 30 scalar distortion in accordance with the following equation: 
corrected by those methods. The fault mode concentrates on 
the permanent failures of a PE. 
Two basic mechanisms can be applied to detecting faults 
periodic self-test. On-line single error correction for arith- 35 
metic operations can be accomplished by arithmetic codes where d(i) is the distortion between an input vector x and a 
such as AN code or Residue code. For the EOS SAR stored codevector Ci which is the irh codevector of the 
processor, temporary distortion Of images due to transient codebook, and D(x,C,) is said distortion parameter as a 
faults mY be tolerable. Hence Second m-Or if Can be function of said input vector x and said stored codevector Ci 
detected by periodic self-test which is performed during 40 ofthe i‘hprOCeSsOrforOsisN-1 and Osjsm-1, where x(i) 
Power-uP and Periodically during operation by temporarily represents the jrh component of the input data vector x, C,(i) 
halting compression of data. For the dual data path (RCPE) is the j‘h component of the i‘h codevector ci w(i) being the 
implementation, each PE is tested by applying the Same weightiFg factor in the distortion measure, and wherein an 
input data and codevector to both its paths and use the index, i, of said codevector C; of minimum distortion, 
COmParatOr t0 determine if the tW0 results are equal Or not. 45 i=min-’ d(i), represents a vector quantization coded d a b  of 
If they are not equal, a Permanent Or a transient fault may an input vector, where said index corresponds to the i“ 
exist in the PE. TO determine whether it is a transient fault processor, OsisN-1. 
or a permanent fault, the same input and codevector are 2. In a systolic-array image processing system, a tree- 
reapplied following the first detection of error If the two searched vector quantizer for image data compression com- 
data Paths Still generate different results, a permanent fault 50 prising a series of L systolic arrays of N, identical proces- 
has been detected and reconfiguration is needed to avoid sors, and a plurality L of levels of subcodebooks, where 1 is 
faulty PE. the tree level index from 1 to L, one level of subcodebooks 
For the DCPE design, Predetermined test inputs are for each of L systolic arrays, and means for successively 
applied since there is O d Y  one data Path and Precomputed comparing an input vector sequence xCk1 with stored code- 
results corresponding to the inputs need to be stored. me 55 vectors in subcodebook levels in search for an output coded 
COIXlpXatOr then compares the generated results with the data sequence i[kI of minimum distortion in accordance with 
stored values. If the two are the same, the PE is fault-free: 
otherwise, the same input is reapplied to find out whether it 
is a permanent or transient fault. Following the location of 
of the PE array. 
The hardware overhead of the self-test and reconfigura- 
tion scheme is about 20%. In PE level, the overhead hard- 
ware includes two reconfiguration switches, one multi- 
plexer, two registers, two comparators, one flag resister, one 65 
n-input OR gate, one control line, n input lines, and one 
output line. In PE array level, only one spare PE is required. 
error detecting and correcting schemes, such as parity and processing input image data vectors and codevectors to 
m-1 
d(i) = D(x, Ci) = 2 w(i)D(Xo?, Ci(i)), in this type of system: on-line concurrent error detection and 9 
the following equations: 
the faulty PE, the spare PE is switched in to maintain the size 60 Jkl- 1  min-’D(x[k’,Cj ) 
05 i15N1-1 
$1 = min-’D(x[kl,Cil[!f]iz) 
0 5 iz 5 Nz - 1 
where xLkl is an input vector sequence and k represents the 
time index of said sequence, and the codevector notation is: 
Cil for binary level 1, Ciliz for binary level 2, and so forth 
with C,, . . . in for level n, i, is codevector index for binary 
where xLkl is an input data vector sequence, k represents the 10 tree 1evel-l subcodebook, i,i, is codevector index for binary 
time index of said sequence, and the codevector notation is: tree level-2 subcodebook, and so forth ili, . . . in is 
for level L, and i1 is a codevector index for tree l'evel-f the level number of the deepest binary tree, D(xLkl,C . . . ) is subcodebook i,, i, . . . i, is a codevector index for tree 
level-L subcodebook, D ( x ~ k ~ C i ,  . . is a distortion function, 15 a distortion function, Plrkl is a vector quantization coded 
and ~ N I  is a vector quantization coded output data sequence output data sequence of said input pector sequence for 
of said input vector sequence for binary tree 1evel-l encoding binary tree level-1 encoding process, iLkl is a vector quan- 
process, ~ , 1  is the maximum value of said vector coded tization coded Output data sequence Of said input vector 
output data for tree level 1, 1 SISL. sequence for overall encoding process, wherein said distor- 
a systolic-array image processing system, a binary 20 tion function between an input vector xLkl and codevectors at 
tree-searched raw codebook vector quantizer for image data the same binary tree level C ,  and C ,  are 
compression comprising a series of n systolic arrays of two 
identical processors and a plurality n of levels of subcode- 
books, one level of subcodebooks for each systolic array for 
successively comparing an input vector sequence xLkl with 25 
dance with the following equations: 
0 5 iL 5 TV- 1 
i [k] - f l t , . . . t L  [kl .[kl 14 
'il for level '; 'ilk for level 2; and so forth with ' i 1 i  . . . i codevector index for binary tree 1evel-n subcodebook, n is 
3. 
m-1 
D(x['],Co) = m2 (x"(i)2 + - 2 z x[kI(i)Co(i) 
9 9 
selected codevector pairs in subcodebook levels in accor- 
D(xrkl,Cl) = (X['tjI2+ C1(jIZ) - 2 m51 X[']G)Cl(jI 
9 9 
11 [kl-min-'D(x'kl,Cil), - i,=O,l 
~z[~]=min-'D(xrkl,C;li2), i,=O,l 
30 xCk1(j) being the j Z h  component of the input vector, j being the 
component index of the input vector, m being the number of 
component of the input vector, C,(j) being the j t h  component 
of the irh codevector, i being the codevector index of the 
codebook, C ,  and C ,  being the codevector pair of the 
35 subcodebook in the same binary tree level, where codebook 
memory size is (2""-2)nK bits, n is a maximum number of 
tree levels, K is a number of bits per pixel, said distortion 
i,,hnin-'D(xlkl,C;l; , , , ;plza), i,=O,l 
~ ' k l ~ , [ k I ~ , [ k l  , , , ;"Is 
where xrkl is an input vector sequence and k represents the 
time index of said sequence, and the codevector notation is: 
ci, for binary level cili2 for binary level 2, and so forth 
with cili2 . , , i" for level n, i, is codevector index for binary 
tree 1evel-l subcodebook, i,i, is codevector index for binary 4o the same binary tree level CO andC1 is simplified as 
between but vector and codevectors at 
tree level-2 subcodebook, and so forth i,i, . . . in is 
codevector index for binary tree level-n subcodebook, n is 
the level number of thePeepest binary tree, D(xCk1,C . . . ) is 
output data sequence of said input vector sequence for 45 
level index from 1 to n, iCk1 is a vector quantization coded 
output data sequence of said input vector sequence for 
overall encoding process. 
tree-searched difference codebook vector quantizer for 
m-1 
[ D ( ~ , c ~ )  - D ( ~ C ~ ) ] I Z  = z (cO(ji2 - c1(j)2y2 - 
a distortion function, i,[kl is a vector quantization coded 9 
m-1 m-1 
binary tree level-1 encoding process and 1 is a binary tree Z x"(i) [Co(i) - Cl(j)] = A  - Z x["(i)6(i) I=o 9 
and instead of saving COG) and CIQ), the terms, 
4. In a systolic-array image processing system, a binary 50 
image date compression comprising a series of systolic A = m21 (Co(j12 - Ci(j3')/2 
arrays of identical processors and a plurality of levels of 9 
subcodebooks, one level of subcodebooks for each systolic 
array for successively comparing an input vector with 55 and ~ Q ) = C O ~ ) ~ ~ Q )  are stored in said codebook, where 
selected codevector pair difference in subcodebook levels in Codebook memory size is (2"-1) [m(K+1)+(2K+10g m)l 
accordance with the following equations: bits. 
~lrkl=min-'D(x[X],Cil), ,=O,l + * * * *  
